Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

F2F

Benchmarks

Task NameDataset NameSOTA ResultTrend
Automated Theorem ProvingMiniF2F (test)
Success Rate99.6
93
Theorem ProvingMiniF2F (val)
Success Rate63.9
59
Formal Theorem ProvingminiF2F Isabelle (val)
Success Rate57
41
Formal Theorem ProvingminiF2F Isabelle (test)
Success Rate51.2
39
Formal Theorem ProvingminiF2F rw (test)
Pass@875
24
Formal Theorem ProvingminiF2F rw (val)
Pass@881.1
24
Theorem ProvingminiF2F Lean (test)
Pass@6452
24
Automated Theorem ProvingminiF2F
Accuracy31.97
18
AutoformalizationminiF2F (test)
TC@196
16
Formal Theorem ProvingminiF2F (val)
Pass@142.2
15
Auto-formalizationMiniF2F (test)
Pass@8100
13
Formal Theorem ProvingminiF2F
Proof Success Rate66.31
12
Formal Theorem ProvingminiF2F
Average Token Cost228.64
12
Informal-to-formal provingminiF2F (val)
Proven Theorems Rate25.8
11
Theorem ProvingminiF2F Lean (val)
Cumulative Pass Rate60.2
10
Lean theorem provingMINIF2F 244 problems
Pass@884.02
9
Informal-to-Formal ProvingminiF2F (test)
Accuracy24.6
6
Automated Theorem ProvingminiF2F Easy Mode (test)
Solved Problems (Pass@32)215
5
Theorem ProvingminiF2F Lean (curriculum)
Pass@6432.1
3
Automated Theorem ProvingminiF2F Hard Mode (test)
Total Solved (Pass@32)204
2
Theorem AutoformalizationF2F mini
Objects Score3.14
1
Showing 21 of 21 rows