Update README.md

2023-10-24 08:28:09 +00:00 · 2023-10-24 08:28:09 +00:00 · 666b81f100
commit 666b81f100
parent 9ad6607fa9
1 changed files with 3 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -35,7 +35,7 @@ dataset_info:

 ## Dataset Description

-This is a pre-processed Intruction Fine-Tuning dataset used for training the Zephyr-7b-beta model.
+This is a pre-processed Supervised Fine-Tuning dataset used for training the Zephyr-7b-beta model.

 The base dataset is [UltraChat](https://github.com/thunlp/UltraChat): an open-source, large-scale, and multi-round dialogue dataset.

@ -47,7 +47,7 @@ The dataset contains:
 The following preprocessing was applied:
 - Selection of a subset of data for faster supervised fine tuning.
 - Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors.
- Removal of dialogues where the assitant replies "I do not have emotions", "I don't have opinions" ...etc (TO BE CONFIRMED AFTER EXPS)
+- Removal of dialogues where the assistant replies "I do not have emotions", "I don't have opinions"

 ## Dataset Structure

@ -84,7 +84,7 @@ The dataset is stored in parquet format with each entry using the following sche
    ],
    "prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af"
 }
-
+```

 ### Citation Information