diff --git a/README.md b/README.md index 1216d89..de131f3 100644 --- a/README.md +++ b/README.md @@ -119,7 +119,7 @@ size_categories: - **Homepage:** https://www.open-assistant.io/ - **Repository:** https://github.com/LAION-AI/Open-Assistant -- **Paper:** TBA +- **Paper:** TBA on April 17, 2023 ### Dataset Summary @@ -129,13 +129,53 @@ corpus consisting of 161,443 messages distributed across 66,497 conversation tre 35 different languages, annotated with 461,292 quality ratings. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers. -### Supported Tasks and Leaderboards - -[More Information Needed] ### Languages -[More Information Needed] +OpenAssistant Conversations incorporates 35 different languages with a distribution of messages as follows: + +**Languages with over 1000 messages +- English: 71956 +- Spanish: 43061 +- Russian: 9089 +- German: 5279 +- Chinese: 4962 +- French: 4251 +- Thai: 3042 +- Portuguese (Brazil): 2969 +- Catalan: 2260 +- Korean: 1553 +- Ukrainian: 1352 +- Italian: 1320 +- Japanese: 1018 + +
+ **Languages with < 1000 messages** + +
## Dataset Structure