From 666b81f1004bd109d314bf016fa8d394e72079d6 Mon Sep 17 00:00:00 2001 From: Edward Beeching Date: Tue, 24 Oct 2023 08:28:09 +0000 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 30477bc..fe026dd 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ dataset_info: ## Dataset Description -This is a pre-processed Intruction Fine-Tuning dataset used for training the Zephyr-7b-beta model. +This is a pre-processed Supervised Fine-Tuning dataset used for training the Zephyr-7b-beta model. The base dataset is [UltraChat](https://github.com/thunlp/UltraChat): an open-source, large-scale, and multi-round dialogue dataset. @@ -47,7 +47,7 @@ The dataset contains: The following preprocessing was applied: - Selection of a subset of data for faster supervised fine tuning. - Truecasing of the dataset, as we observed around %5 of the data contained grammatical errors. -- Removal of dialogues where the assitant replies "I do not have emotions", "I don't have opinions" ...etc (TO BE CONFIRMED AFTER EXPS) +- Removal of dialogues where the assistant replies "I do not have emotions", "I don't have opinions" ## Dataset Structure @@ -84,7 +84,7 @@ The dataset is stored in parquet format with each entry using the following sche ], "prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af" } - +``` ### Citation Information