Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning Trace Quality Evaluation on DROP
Loading...
2.8
Grammar
CRAFT
1.344
1.722
2.1
2.478
Apr 15, 2026
Grammar
Rep-Step
Rep-Word
Updated 3d ago
Evaluation Results
Method
Method
Links
Grammar
Rep-Step
Rep-Word
CRAFT
Model=o4-mini
2026.04
2.8
2.7
2.2
CRAFT
Model=GPT-5.4-nano
2026.04
1.4
1.8
2.1
Feedback
Search any
task
Search any
task