Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Language Modeling on BIG-Bench
Loading...
85.6
Accuracy
TALE
66.672
71.586
76.5
81.414
Oct 26, 2025
Accuracy
Std Dev
Inference Speedup (%)
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Std Dev
Inference Speedup (%)
TALE
Backbone=LLaMA 3.1 8B,...
2025.10
85.6
0.25
-14.4
TALE
Backbone=Qwen 2.5 7B,...
2025.10
81.6
0.14
-19.9
BSBA
Backbone=Qwen 2.5 7B,...
2025.10
81.6
-
-19.9
Baseline
Backbone=Qwen 2.5 7B,...
2025.10
79.2
-
-
Baseline
Backbone=LLaMA 3.1 8B,...
2025.10
77.2
-
-
BSBA
Backbone=LLaMA 3.1 8B,...
2025.10
76.4
-
-32.2
TALE
Backbone=Mistral 7B, E...
2025.10
75.4
0.22
-28
TALE
Backbone=Lucie 7B, Eva...
2025.10
75
0.25
-27.1
BSBA
Backbone=Mistral 7B, E...
2025.10
72.6
-
-33.8
BSBA
Backbone=Lucie 7B, Eva...
2025.10
71
-
-45.1
Baseline
Backbone=Mistral 7B, E...
2025.10
70.4
-
-
Baseline
Backbone=Lucie 7B, Eva...
2025.10
67.4
-
-
Feedback
Search any
task
Search any
task