Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Advanced Reasoning Suite on Open LLM Leaderboard (MMLU-PRO, BBH, GPQA, MATH, IFEval) (test)

28.38MMLU-PRO

Cal-DPO

25.769626.447327.12527.8027Dec 19, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.12
28.3821.7243.5529.782.4234.8763.23
2024.12
27.643.2141.0929.362.0428.1358.28
2024.12
27.0413.3242.0528.452.1533.0657
2024.12
26.7310.4943.2728.441.3621.7661.26
2024.12
26.5212.4542.3327.931.3833.7455.38
2024.12
25.9611.0542.3928.051.2723.1862.01
2024.12
25.8711.5240.5928.151.2527.1460.84