Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Auto-formalization on Prover-Bench
Loading...
100
Pass@8
LongCat-Flash-Prover
63.808
73.204
82.6
91.996
Mar 22, 2026
Pass@8
Updated 26d ago
Evaluation Results
Method
Method
Links
Pass@8
LongCat-Flash-Prover
w/ TIR=true
2026.03
100
LongCat-Flash-Prover
2026.03
95.2
Claude-Opus-4.5
2026.03
94.8
Goedel-V2-Formalizer-32B
2026.03
94.4
Gemini-3 Pro
2026.03
93
Goedel-V2-Formalizer-8B
2026.03
93
Kimi-K2.5
2026.03
91.7
ATF-32B
2026.03
89.6
DeepSeek-V3.2
2026.03
83
ATF-8B-Distilled
2026.03
83
StepFun-Formalizer-32B
2026.03
73.5
Kimina-Autoformalizer-7B
2026.03
67.4
StepFun-Formalizer-7B
2026.03
65.2
Feedback
Search any
task
Search any
task