Update README.md
This commit is contained in:
parent
ca063fbbc4
commit
9ad6607fa9
78
README.md
78
README.md
@ -1,4 +1,13 @@
|
||||
---
|
||||
license: mit
|
||||
task_categories:
|
||||
- conversational
|
||||
- text-generation
|
||||
language:
|
||||
- en
|
||||
size_categories:
|
||||
- 100K<n<1M
|
||||
pretty_name: ZephyrIFT
|
||||
dataset_info:
|
||||
features:
|
||||
- name: prompt
|
||||
@ -21,6 +30,71 @@ dataset_info:
|
||||
download_size: 813207030
|
||||
dataset_size: 1551754213
|
||||
---
|
||||
# Dataset Card for "ultrachat_200k"
|
||||
|
||||
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
# Dataset Card for Dataset Name
|
||||
|
||||
## Dataset Description
|
||||
|
||||
This is a pre-processed Intruction Fine-Tuning dataset used for training the Zephyr-7b-beta model.
|
||||
|
||||
The base dataset is [UltraChat](https://github.com/thunlp/UltraChat): an open-source, large-scale, and multi-round dialogue dataset.
|
||||
|
||||
The dataset contains:
|
||||
- 🌏 **Questions about the World**: The dialogue data in this sector is derived from a wide range of inquiries related to concepts, entities, and objects from the real world. The topics covered are extensive, spanning areas such as technology, art, and entrepreneurship.
|
||||
- ✍🏻 **Writing and Creation**: The dialogue data in this sector is driven by the demands for writing/creation from scratch, and encompasses any tasks that an AI assistant may aid within the creative process, spanning from email composition to crafting narratives and plays, and beyond.
|
||||
- 📋 **Assistance on Existent Materials**: The dialogue data in this sector is generated based on existing materials, including but not limited to rewriting, continuation, summarization, and inference, covering a diverse range of topics.
|
||||
|
||||
The following preprocessing was applied:
|
||||
- Selection of a subset of data for faster supervised fine tuning.
|
||||
- Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors.
|
||||
- Removal of dialogues where the assitant replies "I do not have emotions", "I don't have opinions" ...etc (TO BE CONFIRMED AFTER EXPS)
|
||||
|
||||
## Dataset Structure
|
||||
|
||||
The dataset is stored in parquet format with each entry using the following schema:
|
||||
```
|
||||
|
||||
{
|
||||
"prompt": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
|
||||
"messages":[
|
||||
{
|
||||
"content": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
|
||||
"role": "user"
|
||||
},
|
||||
{
|
||||
"content": "Name: Ava\n\n Ava was just 16 years old when the world as she knew it came crashing down. The government had collapsed, leaving behind a chaotic and lawless society. ...",
|
||||
"role": "assistant"
|
||||
},
|
||||
{
|
||||
"content": "Wow, Ava's story is so intense and inspiring! Can you provide me with more details. ...",
|
||||
"role": "user"
|
||||
},
|
||||
{
|
||||
"content": "Certainly! ....",
|
||||
"role": "assistant"
|
||||
},
|
||||
{
|
||||
"content": "That's really interesting! I would love to hear more...",
|
||||
"role": "user"
|
||||
}
|
||||
{
|
||||
"content": "Certainly! ....",
|
||||
"role": "assistant"
|
||||
},
|
||||
],
|
||||
"prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af"
|
||||
}
|
||||
|
||||
|
||||
### Citation Information
|
||||
|
||||
```bibtex
|
||||
@misc{ZephyrIFT,
|
||||
author = {Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Alexander M. Rush, and Thomas Wolf},
|
||||
title = {ZephyrIFT},
|
||||
year = {2023},
|
||||
publisher = {HuggingFace Hub},
|
||||
journal = {HuggingFace Hub repository},
|
||||
howpublished = {\url{https://huggingface.co/datasets/HuggingFaceH4/zephyr_ift_public}},
|
||||
}
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user