Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Auto-formalization on CombiBench
Loading...
97
Pass@8
LongCat-Flash-Prover
22.12
41.56
61
80.44
Mar 22, 2026
Pass@8
Updated 25d ago
Evaluation Results
Method
Method
Links
Pass@8
LongCat-Flash-Prover
w/ TIR=true
2026.03
97
Claude-Opus-4.5
2026.03
92
Kimi-K2.5
2026.03
84
LongCat-Flash-Prover
2026.03
83
Gemini-3 Pro
2026.03
82
Goedel-V2-Formalizer-32B
2026.03
73
ATF-32B
2026.03
70
DeepSeek-V3.2
2026.03
65
Goedel-V2-Formalizer-8B
2026.03
61
ATF-8B-Distilled
2026.03
59
StepFun-Formalizer-32B
2026.03
50
StepFun-Formalizer-7B
2026.03
40
Kimina-Autoformalizer-7B
2026.03
25
Feedback
Search any
task
Search any
task