Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Theorem Proving on DeepMath
Loading...
94
FR (Fetch Rate)
Gemini-3-Flash
66.96
73.98
81
88.02
Jan 24, 2026
FR (Fetch Rate)
PR (Precision Rate)
VR (Verification Rate)
Updated 4d ago
Evaluation Results
Method
Method
Links
FR (Fetch Rate)
PR (Precision Rate)
VR (Verification Rate)
Gemini-3-Flash
Thinking level=low
2026.01
94
26
20
Baseline
2026.01
92
16
14
Gemini-3-Pro
Thinking level=low
2026.01
90
32
30
Claude-Sonnet-4.5 (Agentic)
2026.01
88
28
26
Qwen-Max (Agentic)
2026.01
88
10
10
GPT-5.2 (Agentic)
2026.01
74
14
14
DeepSeek-V3.2-Thinking (Agentic)
2026.01
70
10
8
DeepSeek-V3.2 (Agentic)
2026.01
68
18
14
Feedback
Search any
task
Search any
task