Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProofWriter

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningProofWriter (test)
Accuracy92.32
57
Logical ReasoningProofWriter
Accuracy98.4
44
Logical ReasoningProofWriter
Accuracy75
43
Logical ReasoningProofWriter
Accuracy99.7
24
Deductive ReasoningProofWriter
End-to-end Accuracy99.67
21
Logical ReasoningProofWriter OWA (3-hop)
Accuracy76.6
20
Logical ReasoningProofWriter CWA 3-hop
Accuracy73.8
20
Deductive ReasoningProofWriter
Pass@197.4
18
Reasoning quality evaluationPROOFWRITER
Somers' D0.339
15
Explanation RefinementProofWriter
Initial Score92
15
ReasoningProofWriter
Accuracy65
14
Logical ReasoningProofWriter (held-out)
Performance0.5483
14
Deductive logical reasoningProofWriter (test)
ExcRate100
12
Logical ReasoningProofWriter depth-5 OWA setting
Accuracy (ProofWriter d5 OWA)71.95
8
Deductive ReasoningProofWriter
Calibrated Accuracy92.1
8
Logical ReasoningProofWriter
Accuracy81.3
7
Logical ReasoningProofWriter
Acc92.2
6
Mathematical ReasoningProofWriter
Score0.5312
3
Logical ReasoningProofWriter
Pass@173.23
2
Deductive logical reasoningProofWriter 600 records (test)
Exc. Rate-
0
Showing 20 of 20 rows