Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning on Timely-Eval
Loading...
82.9
MATH
DeepSeek-V3.2
62.204
67.577
72.95
78.323
Jan 23, 2026
MATH
AIME
GPQA Diam
Updated 4d ago
Evaluation Results
Method
Method
Links
MATH
AIME
GPQA Diam
DeepSeek-V3.2
2026.01
82.9
43.3
58.7
TimelyLM-8B
size=8B
2026.01
78
42.5
49.5
Qwen3-32B
size=32B
2026.01
75
45.7
35.5
GPT-5.1(medium)
variant=medium
2026.01
71.5
46.7
71
Qwen3-8B
size=8B
2026.01
71.2
40
37.5
Qwen3-14B
size=14B
2026.01
70.8
41.7
21
Gemini2.5-pro
variant=pro
2026.01
63
37.5
59
Feedback
Search any
task
Search any
task