Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OpenLLM Leaderboard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Downstream Task EvaluationOpenLLM Leaderboard v1 (test)
MMLU (5-shot)63.95
14
General Language Understanding and ReasoningOpenLLM Leaderboard BBH, GPQA, IFEVAL, MMLU, MUSR (test)
BBH72.7
4
Showing 2 of 2 rows