Update README.md
This commit is contained in:
parent
f8b2b76b6c
commit
dee0b5f871
93
README.md
93
README.md
@ -113,7 +113,7 @@ size_categories:
|
||||
- 10K<n<100K
|
||||
---
|
||||
|
||||
# Dataset Card for OASST1
|
||||
# OpenAssistant Conversations Dataset (OASST1)
|
||||
|
||||
## Dataset Description
|
||||
|
||||
@ -129,6 +129,21 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
|
||||
35 different languages, annotated with 461,292 quality ratings. The corpus is a product
|
||||
of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
|
||||
|
||||
The dataset was exported from the open-assistant.io production database on April, 12 2023.
|
||||
|
||||
### Dataset Structure
|
||||
|
||||
Thes dataset contains demonstrations of of human-assistant conversations that were collected
|
||||
on the open-assistant.io website.
|
||||
|
||||
All conversations are exported as message trees which contain conversation messages nodes. Each message has a
|
||||
role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt.
|
||||
Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
|
||||
The lower the rank the better the message.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
### Languages
|
||||
|
||||
@ -176,79 +191,3 @@ OpenAssistant Conversations incorporates 35 different languages with a distribut
|
||||
<li>Slovak: 19</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
## Dataset Structure
|
||||
|
||||
### Data Instances
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Data Fields
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Data Splits
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Dataset Creation
|
||||
|
||||
### Curation Rationale
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Source Data
|
||||
|
||||
#### Initial Data Collection and Normalization
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Who are the source language producers?
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Annotations
|
||||
|
||||
#### Annotation process
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Who are the annotators?
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Personal and Sensitive Information
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Considerations for Using the Data
|
||||
|
||||
### Social Impact of Dataset
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Discussion of Biases
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Other Known Limitations
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Additional Information
|
||||
|
||||
### Dataset Curators
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Licensing Information
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Citation Information
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Contributions
|
||||
|
||||
[More Information Needed]
|
Loading…
x
Reference in New Issue
Block a user