3.6 KiB
3.6 KiB
license | task_categories | language | size_categories | pretty_name | dataset_info | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mit |
|
|
|
ZephyrIFT |
|
Dataset Card for Dataset Name
Dataset Description
This is a pre-processed Supervised Fine-Tuning dataset used for training the Zephyr-7b-beta model.
The base dataset is UltraChat: an open-source, large-scale, and multi-round dialogue dataset.
The dataset contains:
- 🌏 Questions about the World: The dialogue data in this sector is derived from a wide range of inquiries related to concepts, entities, and objects from the real world. The topics covered are extensive, spanning areas such as technology, art, and entrepreneurship.
- ✍🏻 Writing and Creation: The dialogue data in this sector is driven by the demands for writing/creation from scratch, and encompasses any tasks that an AI assistant may aid within the creative process, spanning from email composition to crafting narratives and plays, and beyond.
- 📋 Assistance on Existent Materials: The dialogue data in this sector is generated based on existing materials, including but not limited to rewriting, continuation, summarization, and inference, covering a diverse range of topics.
The following preprocessing was applied:
- Selection of a subset of data for faster supervised fine tuning.
- Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors.
- Removal of dialogues where the assistant replies "I do not have emotions", "I don't have opinions"
Dataset Structure
The dataset is stored in parquet format with each entry using the following schema:
{
"prompt": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
"messages":[
{
"content": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
"role": "user"
},
{
"content": "Name: Ava\n\n Ava was just 16 years old when the world as she knew it came crashing down. The government had collapsed, leaving behind a chaotic and lawless society. ...",
"role": "assistant"
},
{
"content": "Wow, Ava's story is so intense and inspiring! Can you provide me with more details. ...",
"role": "user"
},
{
"content": "Certainly! ....",
"role": "assistant"
},
{
"content": "That's really interesting! I would love to hear more...",
"role": "user"
}
{
"content": "Certainly! ....",
"role": "assistant"
},
],
"prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af"
}
Citation Information
@misc{ZephyrIFT,
author = {Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Alexander M. Rush, and Thomas Wolf},
title = {ZephyrIFT},
year = {2023},
publisher = {HuggingFace Hub},
journal = {HuggingFace Hub repository},
howpublished = {\url{https://huggingface.co/datasets/HuggingFaceH4/zephyr_ift_public}},
}