Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theorem Proving on CombiBench
Loading...
8
Proof Length
Claude 4.6 Opus
4.892
25.871
46.85
67.829
Apr 29, 2026
Proof Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Proof Length
Claude 4.6 Opus
Type=Proprietary LLM
2026.04
8
Gemini 3.1 Pro
Type=Proprietary LLM
2026.04
15.8
GPT-5.3-Codex
Type=Proprietary LLM
2026.04
22.3
Gemini 2.5 Pro
Type=Proprietary LLM
2026.04
23.8
DreamProver
Backbone=Gemini 2.5 Pr...
2026.04
29.6
DreamProver
Backbone=Gemini 3.1 Pr...
2026.04
31.3
DreamProver
Backbone=GPT-5.3-Codex...
2026.04
37.4
DeepSeek-Prover-V2-7B
Type=Open-source LLM
2026.04
41.4
Goedel-Prover-V2-8B
Type=Open-source LLM
2026.04
45.3
Hilbert
Backbone=Gemini 3.1 Pr...
2026.04
63.4
Hilbert
Backbone=GPT-5.3-Codex...
2026.04
76.5
Goedel-Prover-V2-32B
Type=Open-source LLM
2026.04
76.7
Hilbert
Backbone=Gemini 2.5 Pr...
2026.04
85.7
Feedback
Search any
task
Search any
task