Update README.md

2023-04-15 12:36:25 +00:00 · 2023-04-15 12:36:25 +00:00 · dee0b5f871
commit dee0b5f871
parent f8b2b76b6c
1 changed files with 16 additions and 77 deletions
--- a/README.md
+++ b/README.md
@ -113,7 +113,7 @@ size_categories:
 - 10K<n<100K
 ---
-# Dataset Card for OASST1
+# OpenAssistant Conversations Dataset (OASST1)
 ## Dataset Description
@ -129,6 +129,21 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
 35 different languages, annotated with 461,292 quality ratings. The corpus is a product
 of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
 The dataset was exported from the open-assistant.io production database on April, 12 2023.
 ### Dataset Structure
 Thes dataset contains demonstrations of of human-assistant conversations that were collected
 on the open-assistant.io website.
 All conversations are exported as message trees which contain conversation messages nodes. Each message has a
 role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt. 
 Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
 The lower the rank the better the message.
 ### Languages
@ -176,79 +191,3 @@ OpenAssistant Conversations incorporates 35 different languages with a distribut
    <li>Slovak: 19</li>
  </ul>
 </details>
 ## Dataset Structure
 ### Data Instances
 [More Information Needed]
 ### Data Fields
 [More Information Needed]
 ### Data Splits
 [More Information Needed]
 ## Dataset Creation
 ### Curation Rationale
 [More Information Needed]
 ### Source Data
 #### Initial Data Collection and Normalization
 [More Information Needed]
 #### Who are the source language producers?
 [More Information Needed]
 ### Annotations
 #### Annotation process
 [More Information Needed]
 #### Who are the annotators?
 [More Information Needed]
 ### Personal and Sensitive Information
 [More Information Needed]
 ## Considerations for Using the Data
 ### Social Impact of Dataset
 [More Information Needed]
 ### Discussion of Biases
 [More Information Needed]
 ### Other Known Limitations
 [More Information Needed]
 ## Additional Information
 ### Dataset Curators
 [More Information Needed]
 ### Licensing Information
 [More Information Needed]
 ### Citation Information
 [More Information Needed]
 ### Contributions
 [More Information Needed]