Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on ARC Challenge (Accuracy, Std, Speedup)
Loading...
92
Accuracy
TALE
44.16
56.58
69
81.42
Oct 26, 2025
Accuracy
Std Dev
Speedup (%)
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Std Dev
Speedup (%)
TALE
Backbone=Qwen 2.5 7B,...
2025.10
92
0.15
-6.7
Baseline
Backbone=Qwen 2.5 7B,...
2025.10
86.55
-
-
BSBA
Backbone=Qwen 2.5 7B,...
2025.10
86.55
-
-19.9
TALE
Backbone=LLaMA 3.1 8B,...
2025.10
80.6
0.18
-11.7
Baseline
Backbone=LLaMA 3.1 8B,...
2025.10
79.4
-
-
TALE
Backbone=Mistral 7B, E...
2025.10
79.1
0.18
-18.5
BSBA
Backbone=LLaMA 3.1 8B,...
2025.10
77.6
-
-20.5
Baseline
Backbone=Mistral 7B, E...
2025.10
76.2
-
-
BSBA
Backbone=Mistral 7B, E...
2025.10
76.2
-
-24.6
TALE
Backbone=Lucie 7B, Eva...
2025.10
51.45
0.22
-22.1
BSBA
Backbone=Lucie 7B, Eva...
2025.10
48.8
-
-33.1
Baseline
Backbone=Lucie 7B, Eva...
2025.10
46
-
-
Feedback
Search any
task
Search any
task