Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Large Model Performance Prediction on 285 models on one Math benchmark

100Top-10 Recall

Brute-force Evaluation

20.9641.486282.52Feb 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
100
2026.02
82
2026.02
70
2026.02
52
2026.02
24