- Add task category, language tags (ac106a2b38e53a24c76ee37205fb2207cce88aaa)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

2024-09-23 16:16:30 +00:00

2.5 KiB

Raw Blame History

task_categories

configs

language

question-answering

config_name

data_files

default

split	path
test	test/*.csv

config_name

data_files

by_language

split	path
AR_XY	test/mmlu_AR-XY.csv

split	path
BN_BD	test/mmlu_BN-BD.csv

split	path
DE_DE	test/mmlu_DE-DE.csv

split	path
ES_LA	test/mmlu_ES-LA.csv

split	path
FR_FR	test/mmlu_FR-FR.csv

split	path
HI_IN	test/mmlu_HI-IN.csv

split	path
ID_ID	test/mmlu_ID-ID.csv

split	path
IT_IT	test/mmlu_IT-IT.csv

split	path
JA_JP	test/mmlu_JA-JP.csv

split	path
KO_KR	test/mmlu_KO-KR.csv

split	path
PT_BR	test/mmlu_PT-BR.csv

split	path
SW_KE	test/mmlu_SW-KE.csv

split	path
YO_NG	test/mmlu_YO-NG.csv

split	path
ZH_CN	test/mmlu_ZH-CN.csv

Multilingual Massive Multitask Language Understanding (MMMLU)

The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.

We translated the MMLU’s test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.

This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.

Locales

MMMLU contains the MMLU test set translated into the following locales:

AR_XY (Arabic)
BN_BD (Bengali)
DE_DE (German)
ES_LA (Spanish)
FR_FR (French)
HI_IN (Hindi)
ID_ID (Indonesian)
IT_IT (Italian)
JA_JP (Japanese)
KO_KR (Korean)
PT_BR (Brazilian Portuguese)
SW_KE (Swahili)
YO_NG (Yoruba)
ZH_CH (Simplied Chinese)

Sources

Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding.

OpenAI Simple Evals GitHub Repository

2.5 KiB Raw Blame History Unescape Escape

Multilingual Massive Multitask Language Understanding (MMMLU)

Locales

Sources

2.5 KiB

Raw Blame History