Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sentence-level error detection on DeltaBench CoT Diagnosis 1.0 (test)

43.2Precision

GPT-5 (BIG-Bench Prompt)

3.57613.86324.1534.437Mar 22, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.03
43.265.847
2026.03
30.680.138.6
2026.03
5.14.14.4