Update README.md
This commit is contained in:
parent
dee0b5f871
commit
a2c461b87b
34
README.md
34
README.md
@ -129,19 +129,37 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre
|
||||
35 different languages, annotated with 461,292 quality ratings. The corpus is a product
|
||||
of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
|
||||
|
||||
The dataset was exported from the open-assistant.io production database on April, 12 2023.
|
||||
|
||||
### Dataset Structure
|
||||
|
||||
Thes dataset contains demonstrations of of human-assistant conversations that were collected
|
||||
on the open-assistant.io website.
|
||||
This dataset contains demonstrations of human-assistant conversations which were collected
|
||||
on the open-assistant.io website until April, 12 2023.
|
||||
|
||||
All conversations are exported as message trees which contain conversation messages nodes. Each message has a
|
||||
role which can either be "assistant" or "prompter". The root node of a message tree is called the initial prompt.
|
||||
Nodes with at least two replies of completed trees have a `rank` field which indicates the users' preference consensus.
|
||||
The lower the rank the better the message.
|
||||
Conversations are exported as message trees which contain conversation messages as nodes.
|
||||
The root node of a message tree is called the initial prompt. Each message node can have
|
||||
multiple replies. Nodes with more than one reply can have a `rank` field indicating the
|
||||
order among the siblings sorted by user preference (the most preferred message has rank 0).
|
||||
All messages have a role which can either be "assistant" or "prompter". The roles in
|
||||
conversation threads from prompt to leaf node in a message tree are stricly alternating
|
||||
between "assistant" and "prompter".
|
||||
|
||||
## Main Dataset Files
|
||||
|
||||
Data is provided either as nested as a message tree or as flat list (table) of messages.
|
||||
Names of files containing message trees end in `.trees.jsonl.gz` while files containing
|
||||
a list of messages with a file name ending in `.messages.jsonl.gz`.
|
||||
|
||||
Mesages
|
||||
|
||||
```
|
||||
2023-04-12_oasst_ready.trees.jsonl.gz 10364 trees with 88838 total messages
|
||||
2023-04-12_oasst_ready.messages.jsonl.gz 88838 messages
|
||||
```
|
||||
|
||||
|
||||
```
|
||||
2023-04-12_oasst_all.trees.jsonl.gz 66497 trees with 161443 total messages
|
||||
2023-04-12_oasst_all.messages.jsonl.gz 161443 messages
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user