Update README.md
This commit is contained in:
parent
f8b2b76b6c
commit
dee0b5f871
93
README.md
93
README.md
@ -113,7 +113,7 @@ size_categories:
|
|||||||
- 10K<n<100K
|
- 10K<n<100K
|
||||||
---
|
---
|
||||||
|
|
||||||
# Dataset Card for OASST1
|
# OpenAssistant Conversations Dataset (OASST1)
|
||||||
|
|
||||||
## Dataset Description
|
## Dataset Description
|
||||||
|
|
||||||
@ -129,6 +129,21 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
|
|||||||
35 different languages, annotated with 461,292 quality ratings. The corpus is a product
|
35 different languages, annotated with 461,292 quality ratings. The corpus is a product
|
||||||
of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
|
of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
|
||||||
|
|
||||||
|
The dataset was exported from the open-assistant.io production database on April, 12 2023.
|
||||||
|
|
||||||
|
### Dataset Structure
|
||||||
|
|
||||||
|
Thes dataset contains demonstrations of of human-assistant conversations that were collected
|
||||||
|
on the open-assistant.io website.
|
||||||
|
|
||||||
|
All conversations are exported as message trees which contain conversation messages nodes. Each message has a
|
||||||
|
role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt.
|
||||||
|
Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
|
||||||
|
The lower the rank the better the message.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Languages
|
### Languages
|
||||||
|
|
||||||
@ -176,79 +191,3 @@ OpenAssistant Conversations incorporates 35 different languages with a distribut
|
|||||||
<li>Slovak: 19</li>
|
<li>Slovak: 19</li>
|
||||||
</ul>
|
</ul>
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
## Dataset Structure
|
|
||||||
|
|
||||||
### Data Instances
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Data Fields
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Data Splits
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
## Dataset Creation
|
|
||||||
|
|
||||||
### Curation Rationale
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Source Data
|
|
||||||
|
|
||||||
#### Initial Data Collection and Normalization
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
#### Who are the source language producers?
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Annotations
|
|
||||||
|
|
||||||
#### Annotation process
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
#### Who are the annotators?
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Personal and Sensitive Information
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
## Considerations for Using the Data
|
|
||||||
|
|
||||||
### Social Impact of Dataset
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Discussion of Biases
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Other Known Limitations
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
## Additional Information
|
|
||||||
|
|
||||||
### Dataset Curators
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Licensing Information
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Citation Information
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
||||||
|
|
||||||
### Contributions
|
|
||||||
|
|
||||||
[More Information Needed]
|
|
Loading…
x
Reference in New Issue
Block a user