Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Model Evaluation on MMLU, GSM8K, and BBH
Loading...
62.07
MMLU Accuracy
Grad Sim
59.7612
60.3606
60.96
61.5594
Feb 6, 2025
MMLU Accuracy
GSM8K Accuracy
BBH Accuracy
Average Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
GSM8K Accuracy
BBH Accuracy
Average Accuracy
Grad Sim
training seed=1337, pi...
2025.02
62.07
56.71
64.44
61.07
BipCov
embedding backend=BGE...
2025.02
61.81
61.71
67.14
63.56
RepSim
training seed=1337, pi...
2025.02
61.42
58.45
66.2
62.03
LESS
training seed=1337, pi...
2025.02
61.1
57.77
64.68
61.18
RDS+
training seed=1337, pi...
2025.02
60.63
61.41
66.23
62.75
Random Avg
training seed=1337, pi...
2025.02
60.39
59.64
66.4
62.14
BM25
training seed=1337, pi...
2025.02
59.85
58.98
62.63
60.49
Feedback
Search any
task
Search any
task