Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProofWriter

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningProofWriter
Accuracy98.4
44
Logical ReasoningProofWriter (test)
Accuracy92.32
36
Logical ReasoningProofWriter
Accuracy99.7
24
Logical ReasoningProofWriter
Accuracy68.2
22
Deductive ReasoningProofWriter
End-to-end Accuracy99.67
21
Deductive ReasoningProofWriter
Pass@197.4
18
Reasoning quality evaluationPROOFWRITER
Somers' D0.339
15
Explanation RefinementProofWriter
Initial Score92
15
ReasoningProofWriter
Accuracy65
14
Logical ReasoningProofWriter (held-out)
Performance0.5483
14
Deductive logical reasoningProofWriter (test)
ExcRate100
12
Logical ReasoningProofWriter depth-5 OWA setting
Accuracy (ProofWriter d5 OWA)71.95
8
Deductive ReasoningProofWriter
Calibrated Accuracy92.1
8
Logical ReasoningProofWriter
Accuracy81.3
7
Deductive logical reasoningProofWriter 600 records (test)
Exc. Rate-
0
Showing 15 of 15 rows