MMMLU/README.md
Erik Ritter 4cb32cbf36
Add task category, language tags (#2)
- Add task category, language tags (ac106a2b38e53a24c76ee37205fb2207cce88aaa)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
2024-09-23 16:16:30 +00:00

2.5 KiB
Raw Blame History

task_categories configs language
question-answering
config_name data_files
default
split path
test test/*.csv
config_name data_files
by_language
split path
AR_XY test/mmlu_AR-XY.csv
split path
BN_BD test/mmlu_BN-BD.csv
split path
DE_DE test/mmlu_DE-DE.csv
split path
ES_LA test/mmlu_ES-LA.csv
split path
FR_FR test/mmlu_FR-FR.csv
split path
HI_IN test/mmlu_HI-IN.csv
split path
ID_ID test/mmlu_ID-ID.csv
split path
IT_IT test/mmlu_IT-IT.csv
split path
JA_JP test/mmlu_JA-JP.csv
split path
KO_KR test/mmlu_KO-KR.csv
split path
PT_BR test/mmlu_PT-BR.csv
split path
SW_KE test/mmlu_SW-KE.csv
split path
YO_NG test/mmlu_YO-NG.csv
split path
ZH_CN test/mmlu_ZH-CN.csv
ar
bn
de
es
fr
hi
id
it
ja
ko
pt
sw
yo
zh

Multilingual Massive Multitask Language Understanding (MMMLU)

The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.

We translated the MMLUs test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.

This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.

Locales

MMMLU contains the MMLU test set translated into the following locales:

  • AR_XY (Arabic)
  • BN_BD (Bengali)
  • DE_DE (German)
  • ES_LA (Spanish)
  • FR_FR (French)
  • HI_IN (Hindi)
  • ID_ID (Indonesian)
  • IT_IT (Italian)
  • JA_JP (Japanese)
  • KO_KR (Korean)
  • PT_BR (Brazilian Portuguese)
  • SW_KE (Swahili)
  • YO_NG (Yoruba)
  • ZH_CH (Simplied Chinese)

Sources

Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding.

OpenAI Simple Evals GitHub Repository