Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning trace quality evaluation on CosmosQA
Loading...
2.1
Grammar Score
CRAFT
1.58
1.715
1.85
1.985
Apr 15, 2026
Grammar Score
Repetition Penalty (Step)
Repetition Penalty (Word)
Updated 3d ago
Evaluation Results
Method
Method
Links
Grammar Score
Repetition Penalty (Step)
Repetition Penalty (Word)
CRAFT
Model=o4-mini
2026.04
2.1
2
2.8
CRAFT
Model=GPT-5.4-nano
2026.04
1.6
1.7
1.7
Feedback
Search any
task
Search any
task