Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU (Accuracy, Std, Speedup)
Loading...
71
MMLU Accuracy
TALE
10.68
26.34
42
57.66
Oct 26, 2025
MMLU Accuracy
MMLU Std Dev
MMLU Speedup
Updated 21d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
MMLU Std Dev
MMLU Speedup
TALE
Backbone=Qwen 2.5 7B,...
2025.10
71
19
-16.6
BSBA
Backbone=Qwen 2.5 7B,...
2025.10
68.13
-
-19.9
Baseline
Backbone=Qwen 2.5 7B,...
2025.10
68.1
-
-
TALE
Backbone=Lucie 7B, Eva...
2025.10
54
35
-24.1
TALE
Backbone=LLaMA 3.1 8B,...
2025.10
53.8
22
-2.9
BSBA
Backbone=LLaMA 3.1 8B,...
2025.10
50.2
-
-26.4
Baseline
Backbone=LLaMA 3.1 8B,...
2025.10
48.8
-
-
TALE
Backbone=Mistral 7B, E...
2025.10
40.8
20
-6.2
Baseline
Backbone=Mistral 7B, E...
2025.10
39.4
-
-
BSBA
Backbone=Mistral 7B, E...
2025.10
39
-
-24.6
BSBA
Backbone=Lucie 7B, Eva...
2025.10
15
-
-60.2
Baseline
Backbone=Lucie 7B, Eva...
2025.10
13
-
-
Feedback
Search any
task
Search any
task