Update README.md
This commit is contained in:
parent
9ad6607fa9
commit
666b81f100
@ -35,7 +35,7 @@ dataset_info:
|
|||||||
|
|
||||||
## Dataset Description
|
## Dataset Description
|
||||||
|
|
||||||
This is a pre-processed Intruction Fine-Tuning dataset used for training the Zephyr-7b-beta model.
|
This is a pre-processed Supervised Fine-Tuning dataset used for training the Zephyr-7b-beta model.
|
||||||
|
|
||||||
The base dataset is [UltraChat](https://github.com/thunlp/UltraChat): an open-source, large-scale, and multi-round dialogue dataset.
|
The base dataset is [UltraChat](https://github.com/thunlp/UltraChat): an open-source, large-scale, and multi-round dialogue dataset.
|
||||||
|
|
||||||
@ -47,7 +47,7 @@ The dataset contains:
|
|||||||
The following preprocessing was applied:
|
The following preprocessing was applied:
|
||||||
- Selection of a subset of data for faster supervised fine tuning.
|
- Selection of a subset of data for faster supervised fine tuning.
|
||||||
- Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors.
|
- Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors.
|
||||||
- Removal of dialogues where the assitant replies "I do not have emotions", "I don't have opinions" ...etc (TO BE CONFIRMED AFTER EXPS)
|
- Removal of dialogues where the assistant replies "I do not have emotions", "I don't have opinions"
|
||||||
|
|
||||||
## Dataset Structure
|
## Dataset Structure
|
||||||
|
|
||||||
@ -84,7 +84,7 @@ The dataset is stored in parquet format with each entry using the following sche
|
|||||||
],
|
],
|
||||||
"prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af"
|
"prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af"
|
||||||
}
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### Citation Information
|
### Citation Information
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user