Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning on MMLU Pro OOD official (test)

73.43Overall Accuracy

Qwen3-4B-2507

47.606854.310961.01567.7191Dec 22, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
73.4378.971.2968.6468.7879.5
2025.12
72.6780.4668.5167.4967.879.08
2025.12
72.3779.7970.1367.0567.0777.82
2025.12
72.180.0167.968.6365.8578.1
2025.12
50.0757.8143.1941.3446.3461.65
2025.12
48.656.7741.8843.6441.7159