Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on ProofWriter (Accuracy and Toxic Rate)

98.4Accuracy

PoT

28.7246.8164.982.99Feb 28, 2024Jul 2, 2024Nov 4, 2024Mar 10, 2025Jul 13, 2025Nov 15, 2025Mar 21, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.01
98.4-
2026.01
98.2-
2026.01
94-
2026.01
93-
2026.01
90.2-
2026.01
89.9-
2026.01
88.7-
2026.01
88-
2026.01
86.6-
2026.01
85.8-
2026.01
85.2-
2026.01
84.6-
2026.01
82.7-
2026.03
79.7-
2024.02
78.26.4
2026.01
76.4-
2026.03
76.3-
2026.01
74.6-
2026.01
74-
2026.01
71.6-
2026.01
71.4-
2026.01
71.1-
2026.03
70.7-
2026.01
67.2-
2026.01
65.6-
2026.03
65-
2026.03
65-
2026.01
64.7-
2026.03
61.7-
2026.03
57.7-
2026.03
57.7-
2024.02
56.819.9
2026.01
56.8-
2026.01
52.4-
2026.03
51.7-
2026.03
50.3-
2026.03
49-
2026.03
43.3-
2026.01
42.8-
2026.01
42.8-
2026.01
38.8-
2026.01
35.8-
2026.01
32-
2026.01
31.4-