diff --git a/README.md b/README.md index a109594..486772e 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,12 @@ --- license: mit +tags: + - human-feedback --- -## Overview + +# Dataset Card for HH-RLHF + +## Dataset Summary This repository provides access to: 1. Human preference data about helpfulness and harmlessness from [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862) @@ -39,6 +44,24 @@ Each line of the jsonl file contains a dictionary with the following fields: - `is_upworker` a binary indicator that is true if the red team member was from the crowd platform Upwork or false if they were from MTurk - `tags` a list of up to 6 tags per transcript. tags are short descriptions of the red team attempts generated by crowdworkers who reviewed red team data post-hoc. tags were only provided for a random sample of 1000 red team attempts for two of four model types. +## Usage + +Each of the above datasets is located in a separate sub-directory. To load an individual subset, use the `data_dir` argument of the `load_dataset()` function as follows: + +```python +from datasets import load_dataset + +# Load all helpfulness/harmless subsets (share the same schema) +dataset = load_dataset("Anthropic/hh-rlhf") + +# Load one of the harmless subsets +dataset = load_dataset("Anthropic/hh-rlhf", data_dir="harmless-base") + +# Load the red teaming subset +dataset = load_dataset("Anthropic/hh-rlhf", data_dir="red-team-attempts") +``` + ## Contact -The original authors host this dataset on github here: https://github.com/anthropics/hh-rlhf + +The original authors host this dataset on GitHub here: https://github.com/anthropics/hh-rlhf You can submit inquiries to: redteam@anthropic.com \ No newline at end of file