Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Formal Theorem Proving on large-scale benchmark 2,000 problems (test)
Loading...
0.813
FR Rate
TheoremForge
0.71888
0.743315
0.76775
0.792185
Jan 24, 2026
FR Rate
PR Rate
VR Rate
Expert Calls
Avg Cost
Updated 1mo ago
Evaluation Results
Method
Method
Links
FR Rate
PR Rate
VR Rate
Expert Calls
Avg Cost
TheoremForge
Backbone=Gemini-3-Flash
2026.01
0.813
0.1445
0.126
24,056
0.4806
Baseline
2026.01
0.7225
0.1255
0.086
13,780
-
Feedback
Search any
task
Search any
task