Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Auto-formalization on MiniF2F (test)
Loading...
100
Pass@8
LongCat-Flash-Prover
91.888
93.994
96.1
98.206
Mar 22, 2026
Pass@8
Updated 26d ago
Evaluation Results
Method
Method
Links
Pass@8
LongCat-Flash-Prover
w/ TIR=true
2026.03
100
LongCat-Flash-Prover
2026.03
99.2
Kimi-K2.5
2026.03
98.4
Goedel-V2-Formalizer-8B
2026.03
98.4
Goedel-V2-Formalizer-32B
2026.03
98.4
Claude-Opus-4.5
2026.03
98
ATF-32B
2026.03
98
DeepSeek-V3.2
2026.03
97.5
Gemini-3 Pro
2026.03
97.5
StepFun-Formalizer-7B
2026.03
96.7
StepFun-Formalizer-32B
2026.03
95.9
ATF-8B-Distilled
2026.03
95.1
Kimina-Autoformalizer-7B
2026.03
92.2
Feedback
Search any
task
Search any
task