Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Language Understanding on MMLU (vbal)
Loading...
65.7
vbal Score
Base
-2.628
15.111
32.85
50.589
Apr 6, 2026
vbal Score
Updated 11d ago
Evaluation Results
Method
Method
Links
vbal Score
Base
Model=Tulu3.1-8B
2026.04
65.7
Base
Model=OLMo2-7B
2026.04
64
Base
Model=Llama3.1-8B
2026.04
62.5
Oracle-rephrase
Model=Llama3.1-8B
2026.04
8.6
Oracle-rephrase
Model=Tulu3.1-8B
2026.04
6.6
Oracle-rephrase
Model=OLMo2-7B
2026.04
4.5
SIMPL.
Model=Llama3.1-8B
2026.04
4.3
STRUCT.
Model=Llama3.1-8B
2026.04
3.7
Reflect-and-rephrase
Model=Llama3.1-8B
2026.04
3.6
PROF.
Model=Llama3.1-8B
2026.04
2.9
PROF.
Model=OLMo2-7B
2026.04
1.9
STRUCT.
Model=OLMo2-7B
2026.04
1.3
PROF.
Model=Tulu3.1-8B
2026.04
0.9
Reflect-and-rephrase
Model=OLMo2-7B
2026.04
0.6
STRUCT.
Model=Tulu3.1-8B
2026.04
0.2
SIMPL.
Model=OLMo2-7B
2026.04
0.2
SIMPL.
Model=Tulu3.1-8B
2026.04
0
Reflect-and-rephrase
Model=Tulu3.1-8B
2026.04
0
Feedback
Search any
task
Search any
task