Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Explanation-rating coherence evaluation on Baby
Loading...
83.34
GPT Score
Curr-RLCER
67.636
71.713
75.79
79.867
Apr 7, 2026
GPT Score
Bert-Classifier Score
Human Annotator Score
Updated 10d ago
Evaluation Results
Method
Method
Links
GPT Score
Bert-Classifier Score
Human Annotator Score
Curr-RLCER
2026.04
83.34
91.91
86.17
PETER
2026.04
70.13
81.19
69.15
CER
2026.04
69.88
79.96
75.61
NRT
2026.04
68.24
80.49
70.28
Feedback
Search any
task
Search any
task