Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Large Model Performance Prediction on 285 models on one Math benchmark

100Top-10 Recall

Brute-force Evaluation

20.9641.486282.52Feb 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
100
2026.02
82
2026.02
70
2026.02
52
2026.02
24