Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theorem Proving on DeepMath
Loading...
94
FR (Fetch Rate)
Gemini-3-Flash
66.96
73.98
81
88.02
Jan 24, 2026
FR (Fetch Rate)
PR (Precision Rate)
VR (Verification Rate)
Updated 1mo ago
Evaluation Results
Method
Method
Links
FR (Fetch Rate)
PR (Precision Rate)
VR (Verification Rate)
Gemini-3-Flash
Thinking level=low
2026.01
94
26
20
Baseline
2026.01
92
16
14
Gemini-3-Pro
Thinking level=low
2026.01
90
32
30
Claude-Sonnet-4.5 (Agentic)
2026.01
88
28
26
Qwen-Max (Agentic)
2026.01
88
10
10
GPT-5.2 (Agentic)
2026.01
74
14
14
DeepSeek-V3.2-Thinking (Agentic)
2026.01
70
10
8
DeepSeek-V3.2 (Agentic)
2026.01
68
18
14
Feedback
Search any
task
Search any
task