Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Counterexample generation on VERI-REASON
Loading...
213
Pass@1
Ours
-4.36
52.07
108.5
164.93
Mar 19, 2026
Pass@1
Pass@4
Pass@9
Updated 27d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@4
Pass@9
Ours
Config=fine-tuned
2026.03
213
260
295
Leanabell-prover
Category=OPEN-SOURCED...
2026.03
144
210
231
Deepseek-prover-v2
Category=OPEN-SOURCED...
2026.03
144
203
234
STP-prover
Category=OPEN-SOURCED...
2026.03
131
151
170
Goedel-prover-v2
Category=OPEN-SOURCED...
2026.03
88
147
200
Kimina-prover-distill
Category=OPEN-SOURCED...
2026.03
66
114
156
GPT-4.1-mini
Category=PROPRIETARY R...
2026.03
54
97
150
Deepseek-R1
Category=PROPRIETARY R...
2026.03
51
75
105
Gemini-2.5-Flash
Category=PROPRIETARY R...
2026.03
5
9
15
Grok-3-mini
Category=PROPRIETARY R...
2026.03
4
10
19
Feedback
Search any
task
Search any
task