Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning failure prediction and recovery on CRUXEval (L3)

74Accuracy

thought-tree-based classifier

72.9673.2373.573.77Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
7427
2026.04
7315