Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Formal Theorem Proving on large-scale benchmark 2,000 problems (test)
Loading...
0.813
FR Rate
TheoremForge
0.71888
0.743315
0.76775
0.792185
Jan 24, 2026
FR Rate
PR Rate
VR Rate
Expert Calls
Avg Cost
Updated 4d ago
Evaluation Results
Method
Method
Links
FR Rate
PR Rate
VR Rate
Expert Calls
Avg Cost
TheoremForge
Backbone=Gemini-3-Flash
2026.01
0.813
0.1445
0.126
24,056
0.4806
Baseline
2026.01
0.7225
0.1255
0.086
13,780
-
Feedback
Search any
task
Search any
task