Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Auto-formalization on Putnam-Bench
Loading...
98.1
Pass@8
LongCat-Flash-Prover
44.644
58.522
72.4
86.278
Mar 22, 2026
Pass@8
Updated 26d ago
Evaluation Results
Method
Method
Links
Pass@8
LongCat-Flash-Prover
w/ TIR=true
2026.03
98.1
Claude-Opus-4.5
2026.03
93.5
Gemini-3 Pro
2026.03
90.8
LongCat-Flash-Prover
2026.03
89.9
Goedel-V2-Formalizer-32B
2026.03
85.9
Kimi-K2.5
2026.03
82.8
Goedel-V2-Formalizer-8B
2026.03
80.4
ATF-32B
2026.03
77.5
ATF-8B-Distilled
2026.03
70.7
StepFun-Formalizer-32B
2026.03
65.1
Kimina-Autoformalizer-7B
2026.03
59.3
StepFun-Formalizer-7B
2026.03
55.4
DeepSeek-V3.2
2026.03
46.7
Feedback
Search any
task
Search any
task