openai/MMMLU

README.md

task_categories

configs

language

license

question-answering

config_name

data_files

default

split	path
test	test/*.csv

config_name

data_files

AR_XY

split	path
test	test/mmlu_AR-XY.csv

config_name

data_files

BN_BD

split	path
test	test/mmlu_BN-BD.csv

config_name

data_files

DE_DE

split	path
test	test/mmlu_DE-DE.csv

config_name

data_files

ES_LA

split	path
test	test/mmlu_ES-LA.csv

config_name

data_files

FR_FR

split	path
test	test/mmlu_FR-FR.csv

config_name

data_files

HI_IN

split	path
test	test/mmlu_HI-IN.csv

config_name

data_files

ID_ID

split	path
test	test/mmlu_ID-ID.csv

config_name

data_files

IT_IT

split	path
test	test/mmlu_IT-IT.csv

config_name

data_files

JA_JP

split	path
test	test/mmlu_JA-JP.csv

config_name

data_files

KO_KR

split	path
test	test/mmlu_KO-KR.csv

config_name

data_files

PT_BR

split	path
test	test/mmlu_PT-BR.csv

config_name

data_files

SW_KE

split	path
test	test/mmlu_SW-KE.csv

config_name

data_files

YO_NG

split	path
test	test/mmlu_YO-NG.csv

config_name

data_files

ZH_CN

split	path
test	test/mmlu_ZH-CN.csv

mit

Multilingual Massive Multitask Language Understanding (MMMLU)

The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.

We translated the MMLU’s test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.

This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.

Locales

MMMLU contains the MMLU test set translated into the following locales:

AR_XY (Arabic)
BN_BD (Bengali)
DE_DE (German)
ES_LA (Spanish)
FR_FR (French)
HI_IN (Hindi)
ID_ID (Indonesian)
IT_IT (Italian)
JA_JP (Japanese)
KO_KR (Korean)
PT_BR (Brazilian Portuguese)
SW_KE (Swahili)
YO_NG (Yoruba)
ZH_CN (Simplified Chinese)

Sources

Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding.

OpenAI Simple Evals GitHub Repository

README.md Unescape Escape

Multilingual Massive Multitask Language Understanding (MMMLU)

Locales

Sources

README.md