Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Step-level error discrimination on MATH and GSM8k (test)

0.762AUROC (Step-level Error Discrimination)

Fine-tuned

0.489520.560260.6310.70174May 7, 2026
Updated 24d ago

Evaluation Results

MethodLinks
2026.05
0.7620.38273.1
2026.05
0.5540.16116.4
2026.05
0.5190.12811
2026.05
0.50.11811.8