OpenAssistant/oasst1

Andreas Köpf 7bf6eb80f9 Update README.md

2023-04-15 00:16:00 +00:00

3.1 KiB

Raw Blame History

license

dataset_info

apache-2.0

features

splits

download_size

dataset_size

name	dtype
message_id	string

name	dtype
parent_id	string

name	dtype
user_id	string

name	dtype
created_date	string

name	dtype
text	string

name	dtype
role	string

name	dtype
lang	string

name	dtype
review_count	int32

name	dtype
review_result	bool

name	dtype
deleted	bool

name	dtype
rank	int32

name	dtype
synthetic	bool

name	dtype
model_name	string

name

struct

detoxify

name	dtype
toxicity	float64

name	dtype
severe_toxicity	float64

name	dtype
obscene	float64

name	dtype
identity_attack	float64

name	dtype
insult	float64

name	dtype
threat	float64

name	dtype
sexual_explicit	float64

name	dtype
message_tree_id	string

name	dtype
tree_state	string

name

sequence

emojis

name	dtype
name	string

name	dtype
count	int32

name

sequence

labels

name	dtype
name	string

name	dtype
value	float64

name	dtype
count	int32

name	num_bytes	num_examples
train	100367999	84437

name	num_bytes	num_examples
validation	5243405	4401

41596430

105611404

Dataset Card for OASST1

Dataset Description

Homepage: https://www.open-assistant.io/
Repository: https://github.com/LAION-AI/Open-Assistant
Paper: TBA

Dataset Summary

In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

[More Information Needed]

Contributions

[More Information Needed]

3.1 KiB Raw Blame History

Dataset Card for OASST1

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions

3.1 KiB

Raw Blame History