Update README.md

This commit is contained in:
Andreas Köpf 2023-04-15 13:26:01 +00:00 committed by huggingface-web
parent a2c461b87b
commit 12c7304a36

@ -132,29 +132,35 @@ of a worldwide crowd-sourcing effort involving over 13,500 volunteers.
### Dataset Structure
This dataset contains demonstrations of human-assistant conversations which were collected
on the open-assistant.io website until April, 12 2023.
on the [open-assistant.io](https://www.open-assistant.io/) website until April, 12 2023.
Conversations are exported as message trees which contain conversation messages as nodes.
The root node of a message tree is called the initial prompt. Each message node can have
Conversations are exported as conversation trees which contain conversation messages as nodes.
The root node of a conversation tree is called the initial prompt. Each message can have
multiple replies. Nodes with more than one reply can have a `rank` field indicating the
order among the siblings sorted by user preference (the most preferred message has rank 0).
user preference (the most preferred message has rank 0).
All messages have a role which can either be "assistant" or "prompter". The roles in
conversation threads from prompt to leaf node in a message tree are stricly alternating
conversation threads from prompt to leaf node in a conversation tree are stricly alternating
between "assistant" and "prompter".
## Main Dataset Files
Data is provided either as nested as a message tree or as flat list (table) of messages.
Names of files containing message trees end in `.trees.jsonl.gz` while files containing
a list of messages with a file name ending in `.messages.jsonl.gz`.
Data is provided either as nested messages in conversation trees or as flat list of messages.
Mesages
The type of file can be inferred from the file name extension:
- `.trees.jsonl.gz`: Conversation trees with nested messages
- `.messages.jsonl.gz`: Flat list of messages
### Ready for export trees
```
2023-04-12_oasst_ready.trees.jsonl.gz 10364 trees with 88838 total messages
2023-04-12_oasst_ready.messages.jsonl.gz 88838 messages
```
### All trees
```
2023-04-12_oasst_all.trees.jsonl.gz 66497 trees with 161443 total messages
@ -162,7 +168,6 @@ Mesages
```
### Languages
OpenAssistant Conversations incorporates 35 different languages with a distribution of messages as follows: