Update README.md

This commit is contained in:
Andreas Köpf 2023-04-15 12:36:25 +00:00 committed by huggingface-web
parent f8b2b76b6c
commit dee0b5f871

@ -113,7 +113,7 @@ size_categories:
- 10K<n<100K
---
# Dataset Card for OASST1
# OpenAssistant Conversations Dataset (OASST1)
## Dataset Description
@ -129,6 +129,21 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
35 different languages, annotated with 461,292 quality ratings. The corpus is a product
of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
The dataset was exported from the open-assistant.io production database on April, 12 2023.
### Dataset Structure
Thes dataset contains demonstrations of of human-assistant conversations that were collected
on the open-assistant.io website.
All conversations are exported as message trees which contain conversation messages nodes. Each message has a
role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt.
Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
The lower the rank the better the message.
### Languages
@ -176,79 +191,3 @@ OpenAssistant Conversations incorporates 35 different languages with a distribut
<li>Slovak: 19</li>
</ul>
</details>
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]