Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Question Answering on CommonQA (accuracy, std, speedup)
Loading...
84.4
Accuracy
TALE
52.992
61.146
69.3
77.454
Oct 26, 2025
Accuracy
Std Dev
Speedup (%)
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Std Dev
Speedup (%)
TALE
Backbone=Qwen 2.5 7B,...
2025.10
84.4
0.17
-6.6
BSBA
Backbone=Qwen 2.5 7B,...
2025.10
80.5
-
-19.9
Baseline
Backbone=Qwen 2.5 7B,...
2025.10
80.3
-
-
TALE
Backbone=LLaMA 3.1 8B,...
2025.10
73.8
0.15
-8.8
BSBA
Backbone=LLaMA 3.1 8B,...
2025.10
73.1
-
-17.6
Baseline
Backbone=LLaMA 3.1 8B,...
2025.10
72.9
-
-
TALE
Backbone=Lucie 7B, Eva...
2025.10
68.6
0.27
-9.1
TALE
Backbone=Mistral 7B, E...
2025.10
64.4
0.19
-12.3
BSBA
Backbone=Mistral 7B, E...
2025.10
61.6
-
-21.5
Baseline
Backbone=Mistral 7B, E...
2025.10
61
-
-
BSBA
Backbone=Lucie 7B, Eva...
2025.10
54.6
-
-48.2
Baseline
Backbone=Lucie 7B, Eva...
2025.10
54.2
-
-
Feedback
Search any
task
Search any
task