Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Auto-formalization on ProofNet (test)
Loading...
97.9
Pass@8
LongCat-Flash-Prover
43.612
57.706
71.8
85.894
Mar 22, 2026
Pass@8
Updated 26d ago
Evaluation Results
Method
Method
Links
Pass@8
LongCat-Flash-Prover
w/ TIR=true
2026.03
97.9
Gemini-3 Pro
2026.03
91.9
Claude-Opus-4.5
2026.03
90.9
Kimi-K2.5
2026.03
88.2
LongCat-Flash-Prover
2026.03
87.1
DeepSeek-V3.2
2026.03
81.8
Goedel-V2-Formalizer-32B
2026.03
79
Goedel-V2-Formalizer-8B
2026.03
76.9
StepFun-Formalizer-32B
2026.03
62.4
Kimina-Autoformalizer-7B
2026.03
55.4
StepFun-Formalizer-7B
2026.03
53.8
ATF-32B
2026.03
53.8
ATF-8B-Distilled
2026.03
45.7
Feedback
Search any
task
Search any
task