Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

F2F

Benchmarks

Task NameDataset NameSOTA ResultTrend
Automated Theorem ProvingMiniF2F (test)
Success Rate99.6
93
Theorem ProvingMiniF2F (val)
Success Rate63.9
59
Formal Theorem ProvingminiF2F Isabelle (val)
Success Rate57
41
Formal Theorem ProvingminiF2F Isabelle (test)
Success Rate51.2
39
Theorem ProvingminiF2F Lean (test)
Pass@6452
24
AutoformalizationminiF2F (test)
TC@196
16
Formal Theorem ProvingminiF2F (val)
Pass@142.2
15
Auto-formalizationMiniF2F (test)
Pass@8100
13
Informal-to-formal provingminiF2F (val)
Proven Theorems Rate25.8
11
Theorem ProvingminiF2F Lean (val)
Cumulative Pass Rate60.2
10
Lean theorem provingMINIF2F 244 problems
Pass@884.02
9
Informal-to-Formal ProvingminiF2F (test)
Accuracy24.6
6
Theorem ProvingminiF2F Lean (curriculum)
Pass@6432.1
3
Showing 13 of 13 rows