Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning trace quality evaluation on eSNLI
Loading...
5.9
Grammar Score
CRAFT
2.26
3.205
4.15
5.095
Apr 15, 2026
Grammar Score
Rep-Step Penalty
Rep-Word Penalty
Updated 3d ago
Evaluation Results
Method
Method
Links
Grammar Score
Rep-Step Penalty
Rep-Word Penalty
CRAFT
Model=o4-mini
2026.04
5.9
2
1.4
CRAFT
Model=GPT-5.4-nano
2026.04
2.4
1.5
1.2
Feedback
Search any
task
Search any
task