Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning failure prediction and recovery on CRUXEval L2

77Accuracy

thought-tree-based classifier

-2.310418.279838.8759.4602Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
7713
2026.04
7522
2026.04
0.830.21
2026.04
0.740.06