Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning failure prediction on CodeLingua (L1)

73Accuracy

thought-tree-based classifier

67.869.1570.571.85Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
73
2026.04
68