From 79f6639fbde6c05af0d422adc56d7e85d6a18c8c Mon Sep 17 00:00:00 2001 From: Matthew Hayes Date: Wed, 12 Apr 2023 09:13:47 +0000 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0a5b7eb..20e90f6 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,8 @@ inference: false ## Summary Databricks’ `dolly-v2-12b`, an instruction-following large language model trained on the Databricks machine learning platform -that is licensed for commercial use. based on `pythia-12b`, Dolly is trained on ~15k instruction/response fine tuning records -[`databricks-dolly-15k`](https://huggingface.co/datasets/databricks/databricks-dolly-15k) generated +that is licensed for commercial use. Based on `pythia-12b`, Dolly is trained on ~15k instruction/response fine tuning records +[`databricks-dolly-15k`](https://github.com/databrickslabs/dolly/tree/master/data) generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization. `dolly-v2-12b` is not a state-of-the-art model, but does exhibit surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based. @@ -20,7 +20,7 @@ high quality instruction following behavior not characteristic of the foundation ## Model Overview `dolly-v2-12b` is a 12 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from [EleutherAI’s](https://www.eleuther.ai/) [Pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) and fine-tuned -on a ~15K record instruction corpus generated by Databricks employees and released under a permissive license (CC-BY-SA) +on a [~15K record instruction corpus](https://github.com/databrickslabs/dolly/tree/master/data) generated by Databricks employees and released under a permissive license (CC-BY-SA) ## Usage