From bf7d7cc42834f17dd65537ceaae109101b77282a Mon Sep 17 00:00:00 2001
From: Bleys
Date: Sat, 1 Jul 2023 02:43:16 +0000
Subject: [PATCH] Update README.md
---
README.md | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/README.md b/README.md
index 47c6af4..9a5e487 100644
--- a/README.md
+++ b/README.md
@@ -18,8 +18,8 @@ size_categories:
- 10M🐋 The Open Orca Dataset! 🐋
-
+
We are thrilled to announce the release of the Open Orca dataset!
This rich collection of augmented FLAN data aligns, as best as possible, with the distributions outlined in the [Orca paper](https://arxiv.org/abs/2306.02707).
It has been instrumental in generating high-performing model checkpoints and serves as a valuable resource for all NLP researchers and developers!
+
+
+Dataset Summary
+
+The Open Orca dataset is a collection of unaugmented and augmented FLAN data.
+Currently ~1M GPT-4 completions, and ~3.5M GPT-3.5 completions.
+It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope.
+The data is primarily used for training and evaluation in the field of natural language processing.
+
+
+
+Dataset Attribution
+
We would like to give special recognition to the following contributors for their significant efforts and dedication:
@@ -70,15 +83,6 @@ Many thanks to NanoBit and Caseus, makers of [Axolotl](https://github.com/OpenAc
We are welcoming sponsors or collaborators to help us build these models to the scale they deserve. Please reach out via our socials:
http://Alignmentlab.ai https://discord.gg/n9hXaBPWxx
-
-
-Dataset Summary
-
-The Open Orca dataset is a collection of unaugmented and augmented FLAN data.
-Currently ~1M GPT-4 completions, and ~3.5M GPT-3.5 completions.
-It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope.
-The data is primarily used for training and evaluation in the field of natural language processing.
-
Supported Tasks and Leaderboards