Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Reasoning on Language Reasoning Average
Loading...
73.25
Accuracy
FP16
59.3764
62.9782
66.58
70.1818
Mar 18, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
FP16
Backbone=Qwen2.5-14B
2026.03
73.25
FP16
Backbone=Llama-2-13B
2026.03
67.23
NSDS
Backbone=Qwen2.5-14B
2026.03
66.15
ZD
Backbone=Qwen2.5-14B
2026.03
65.42
KurtBoost
Backbone=Qwen2.5-14B
2026.03
65.13
MSE
Backbone=Qwen2.5-14B
2026.03
64.29
EWQ
Backbone=Qwen2.5-14B
2026.03
63.36
NSDS
Backbone=Llama-2-13B
2026.03
63.01
KurtBoost
Backbone=Llama-2-13B
2026.03
62.08
EWQ
Backbone=Llama-2-13B
2026.03
61.68
ZD
Backbone=Llama-2-13B
2026.03
60.99
MSE
Backbone=Llama-2-13B
2026.03
59.91
Feedback
Search any
task
Search any
task