diff --git a/README.md b/README.md index cce1692..0205d2e 100644 --- a/README.md +++ b/README.md @@ -34,4 +34,36 @@ configs: path: "test/mmlu_YO-NG.csv" - split: ZH_CN path: "test/mmlu_ZH-CN.csv" ---- \ No newline at end of file +--- + +# Multilingual Massive Multitask Language Understanding (MMMLU) + +The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science. + +We translated the MMLU’s test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations. + +This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide. + +## Locales + +MMMLU contains the MMLU test set translated into the following locales: +* AR_XY (Arabic) +* BN_BD (Bengali) +* DE_DE (German) +* ES_LA (Spanish) +* FR_FR (French) +* HI_IN (Hindi) +* ID_ID (Indonesian) +* IT_IT (Italian) +* JA_JP (Japanese) +* KO_KR (Korean) +* PT_BR (Brazilian Portuguese) +* SW_KE (Swahili) +* YO_NG (Yoruba) +* ZH_CH (Simplied Chinese) + +## Sources + +Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300). + +[OpenAI Simple Evals GitHub Repository](https://github.com/openai/simple-evals) \ No newline at end of file