Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on XCSQA
Loading...
64.6
CLCall Score
Qwen2.5-14B
2.2
18.4
34.6
50.8
Mar 4, 2026
CLCall Score
AEN Score
A¬EN Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
CLCall Score
AEN Score
A¬EN Score
Qwen2.5-14B
Model=Qwen2.5-14B
2026.03
64.6
87
56.9
Aya-Expanse-8B
Model=Aya-Expanse-8B
2026.03
62.6
78
54.4
Qwen3-14B
Model=Qwen3-14B
2026.03
61.9
77.6
54
Llama3.1-8B
Model=Llama3.1-8B
2026.03
60.2
67.5
47.7
Gemma3-12B-pt
Model=Gemma3-12B-pt
2026.03
58.3
66
47.2
DCO
Base Model=Llama3.1-8B
2026.03
9.1
0.2
1.3
DCO
Base Model=Qwen3-14B
2026.03
7.1
1.1
3.8
DCO
Base Model=Qwen2.5-14B
2026.03
6.8
-2.5
4.7
DCO
Base Model=Aya-Expanse-8B
2026.03
6.4
0.6
3.7
DCO
Base Model=Gemma3-12B-pt
2026.03
4.6
0.1
3.6
Feedback
Search any
task
Search any
task