corrected a typo (#5)
- corrected a typo (77fc53d7cef8458c3881219f7a641b3cb9b22d22) Co-authored-by: Ilyas Moutawwakil <IlyasMoutawwakil@users.noreply.huggingface.co>
This commit is contained in:
parent
e49c179d8d
commit
c63a64ac7c
@ -131,7 +131,7 @@ Falcon-40B was trained on 1,000B tokens of [RefinedWeb](https://huggingface.co/d
|
|||||||
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
||||||
|--------------------|--------------|------------|-----------------------------------|
|
|--------------------|--------------|------------|-----------------------------------|
|
||||||
| [RefinedWeb-English](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 75% | 750B | massive web crawl |
|
| [RefinedWeb-English](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 75% | 750B | massive web crawl |
|
||||||
| RefinedWeb-Europe | 7% | 70B | European massive zeb crawl |
|
| RefinedWeb-Europe | 7% | 70B | European massive web crawl |
|
||||||
| Books | 6% | 60B | |
|
| Books | 6% | 60B | |
|
||||||
| Conversations | 5% | 50B | Reddit, StackOverflow, HackerNews |
|
| Conversations | 5% | 50B | Reddit, StackOverflow, HackerNews |
|
||||||
| Code | 5% | 50B | |
|
| Code | 5% | 50B | |
|
||||||
|
Loading…
x
Reference in New Issue
Block a user