Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Utility Evaluation on LM Utility Evaluation Dataset
Loading...
9.12
Utility Score
CB
7.612
8.0035
8.395
8.7865
Apr 17, 2026
Utility Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Utility Score
CB
Model=Mistral-Small
2026.04
9.12
Base
Model=Mistral-Small
2026.04
8.9
Base
Model=Mistral-Nemo
2026.04
8.78
CB
Model=Mistral-Nemo
2026.04
8.72
Goal
Model=Mistral-Small
2026.04
8.69
Goal
Model=Mistral-Nemo
2026.04
8.39
Beam
Model=Mistral-Small
2026.04
8.06
Beam
Model=Mistral-Nemo
2026.04
7.67
Feedback
Search any
task
Search any
task