Multilingual Language Understanding

Benchmarks

Dataset Name	SOTA Method	Metric
MMMLU (test)	MSD (Off-Policy)	Accuracy66.88	52	28d ago
M-MMLU (test)	COMPASS	Overall Accuracy59.6	38	1mo ago
MMMLU		CLCall76.1	30	2mo ago
MMLU-ProX (test)	COMPASS	Accuracy43.6	24	1mo ago
MMLU-ProX	Qwen3-8B (Thinking)	Accuracy78.1	24	12d ago
MMMLU (Massive Multilingual Language Understanding)	Qwen-3-30B-A3B	Accuracy79.5	21	3mo ago
MMMLU	Task Arithmetic	Accuracy (Korean)60.5	20	3mo ago
Qwen Multi-task Evaluation Suite 2.5 (test)	TriMix (PPL)	MC Score59.5	18	1mo ago
French (HellaSwag, ARC-Challenge, XNLI, and MMLU) translated (test)	GenKnowSub	HellaSwag Accuracy57.83	8	3mo ago
German (HellaSwag, ARC-Challenge, XNLI, and MMLU) translated (test)	Phi-3	HellaSwag52.48	8	3mo ago
Portuguese	Qwen2.5	Average Performance64.6	6	3mo ago
French	Qwen2.5	Average Performance55.5	6	3mo ago
Spanish	Gamayun	Average Performance55.7	6	3mo ago
German	Qwen2.5	Average Performance55.7	6	3mo ago
Bulgarian	Gamayun	Average Performance48.4	6	3mo ago
Arabic	Gamayun	Average Performance0.572	6	3mo ago
Russian	Gamayun	Arc-ru34.9	6	3mo ago
Chinese	Qwen2.5	Average Performance68.7	5	3mo ago
Thai	Qwen2.5	Avg Performance54.5	5	3mo ago
Multilingual Understanding	Qwen2-72B	Accuracy80.7	5	3mo ago
MMLU ProX-Lite		Accuracy (en)78.2	3	1mo ago
INCLUDE 5-shot	ERNIE 5.0-Base	Accuracy77.81	3	3mo ago
MMMLU 5-shot	ERNIE 5.0-Base	Accuracy78.94	3	3mo ago
MMMLU	Llama-Instruct	Normalized Log Accuracy (MMMLU)78.3	2	2mo ago
GlobalMMLU, MMMLU, MMLU-ProX-Lite, BELEBELE	-	-	0	1mo ago

Showing 25 of 25 rows