Update README.md

2023-04-15 12:36:25 +00:00 · 2023-04-15 12:36:25 +00:00 · dee0b5f871
commit dee0b5f871
parent f8b2b76b6c
1 changed files with 16 additions and 77 deletions
--- a/README.md
+++ b/README.md
@ -113,7 +113,7 @@ size_categories:
 - 10K<n<100K
 ---

-# Dataset Card for OASST1
+# OpenAssistant Conversations Dataset (OASST1)

 ## Dataset Description

@ -129,6 +129,21 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
 35 different languages, annotated with 461,292 quality ratings. The corpus is a product
 of a worldwide crowd-sourcing effort involving over 13,500 volunteers.

+The dataset was exported from the open-assistant.io production database on April, 12 2023.
+
+### Dataset Structure
+
+Thes dataset contains demonstrations of of human-assistant conversations that were collected
+on the open-assistant.io website.
+
+All conversations are exported as message trees which contain conversation messages nodes. Each message has a
+role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt. 
+Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
+The lower the rank the better the message.
+
+
+
+

 ### Languages

@ -176,79 +191,3 @@ OpenAssistant Conversations incorporates 35 different languages with a distribut
    <li>Slovak: 19</li>
  </ul>
 </details>
-
-## Dataset Structure
-
-### Data Instances
-
-[More Information Needed]
-
-### Data Fields
-
-[More Information Needed]
-
-### Data Splits
-
-[More Information Needed]
-
-## Dataset Creation
-
-### Curation Rationale
-
-[More Information Needed]
-
-### Source Data
-
-#### Initial Data Collection and Normalization
-
-[More Information Needed]
-
-#### Who are the source language producers?
-
-[More Information Needed]
-
-### Annotations
-
-#### Annotation process
-
-[More Information Needed]
-
-#### Who are the annotators?
-
-[More Information Needed]
-
-### Personal and Sensitive Information
-
-[More Information Needed]
-
-## Considerations for Using the Data
-
-### Social Impact of Dataset
-
-[More Information Needed]
-
-### Discussion of Biases
-
-[More Information Needed]
-
-### Other Known Limitations
-
-[More Information Needed]
-
-## Additional Information
-
-### Dataset Curators
-
-[More Information Needed]
-
-### Licensing Information
-
-[More Information Needed]
-
-### Citation Information
-
-[More Information Needed]
-
-### Contributions
-
-[More Information Needed]