Add human-feedback tag and usage instructions (#4)

- Add human-feedback tag and usage instructions (53bb19af48e3dc246069b4a2e757a5ee5caeb549)


Co-authored-by: Lewis Tunstall <lewtun@users.noreply.huggingface.co>
This commit is contained in:
Deep Ganguli 2023-01-04 19:16:45 +00:00 committed by system
parent 84e2ad8be2
commit ab92ea47d0

@ -1,7 +1,12 @@
---
license: mit
tags:
- human-feedback
---
## Overview
# Dataset Card for HH-RLHF
## Dataset Summary
This repository provides access to:
1. Human preference data about helpfulness and harmlessness from [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862)
@ -39,6 +44,24 @@ Each line of the jsonl file contains a dictionary with the following fields:
- `is_upworker` a binary indicator that is true if the red team member was from the crowd platform Upwork or false if they were from MTurk
- `tags` a list of up to 6 tags per transcript. tags are short descriptions of the red team attempts generated by crowdworkers who reviewed red team data post-hoc. tags were only provided for a random sample of 1000 red team attempts for two of four model types.
## Usage
Each of the above datasets is located in a separate sub-directory. To load an individual subset, use the `data_dir` argument of the `load_dataset()` function as follows:
```python
from datasets import load_dataset
# Load all helpfulness/harmless subsets (share the same schema)
dataset = load_dataset("Anthropic/hh-rlhf")
# Load one of the harmless subsets
dataset = load_dataset("Anthropic/hh-rlhf", data_dir="harmless-base")
# Load the red teaming subset
dataset = load_dataset("Anthropic/hh-rlhf", data_dir="red-team-attempts")
```
## Contact
The original authors host this dataset on github here: https://github.com/anthropics/hh-rlhf
The original authors host this dataset on GitHub here: https://github.com/anthropics/hh-rlhf
You can submit inquiries to: redteam@anthropic.com