Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning failure prediction and recovery on CRUXEval L1

89Accuracy

thought-tree-based classifier

87.9688.2388.588.77Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
8965
2026.04
8880