Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Counterexample generation on VERI-FORMALIZE
Loading...
174
Pass@1
Ours
-4.88
41.56
88
134.44
Mar 19, 2026
Pass@1
Pass@4
Pass@9
Updated 27d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@4
Pass@9
Ours
Config=fine-tuned
2026.03
174
255
313
Leanabell-prover
Category=OPEN-SOURCED...
2026.03
111
198
228
STP-prover
Category=OPEN-SOURCED...
2026.03
99
150
171
Deepseek-prover-v2
Category=OPEN-SOURCED...
2026.03
69
135
186
Goedel-prover-v2
Category=OPEN-SOURCED...
2026.03
63
165
201
GPT-4.1-mini
Category=PROPRIETARY R...
2026.03
31
87
137
Deepseek-R1
Category=PROPRIETARY R...
2026.03
27
72
102
Kimina-prover-distill
Category=OPEN-SOURCED...
2026.03
18
141
249
Gemini-2.5-Flash
Category=PROPRIETARY R...
2026.03
2
8
13
Grok-3-mini
Category=PROPRIETARY R...
2026.03
2
8
19
Feedback
Search any
task
Search any
task