Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Counterexample generation on FOR-COUNTER
Loading...
222
Pass@1
Ours
10.88
65.69
120.5
175.31
Mar 19, 2026
Pass@1
Pass@4
Pass@9
Updated 27d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@4
Pass@9
Ours
Config=fine-tuned
2026.03
222
274
302
Deepseek-prover-v2
Category=OPEN-SOURCED...
2026.03
127
200
224
Leanabell-prover
Category=OPEN-SOURCED...
2026.03
106
159
181
STP-prover
Category=OPEN-SOURCED...
2026.03
101
157
179
Goedel-prover-v2
Category=OPEN-SOURCED...
2026.03
89
177
215
Deepseek-R1
Category=PROPRIETARY R...
2026.03
61
135
158
Kimina-prover-distill
Category=OPEN-SOURCED...
2026.03
31
109
165
Grok-3-mini
Category=PROPRIETARY R...
2026.03
30
74
101
Gemini-2.5-Flash
Category=PROPRIETARY R...
2026.03
21
66
82
GPT-4.1-mini
Category=PROPRIETARY R...
2026.03
19
65
103
Feedback
Search any
task
Search any
task