Share your thoughts, 1 month free Claude Pro on usSee more

Common Sense Reasoning on HellaSwag (acc_n)

95.7Accuracy (acc_n)

Always Tell Me The Odds

Updated 1mo ago

Evaluation Results

Method
Always Tell Me The Odds 2025.05	95.7	-
Zamba-7B 2025.07	76.4	-
Always Tell Me The Odds 2025.05	75.5	-
GPT-4o 2025.05	75.4	-
Always Tell Me The Odds 2025.05	75.3	-
Llama-3-Instruct 2025.05	74.9	-
LLaMA-3-8B-Lizard 2025.07	73.6	-
LLaMA-3-8B 2025.07	73.1	-
Mamba2-LLaMA-3 2025.07	71.5	-
Always Tell Me The Odds 2025.05	70.2	-
Always Tell Me The Odds 2025.05	67.2	-
StripedHyena-Nous-7B 2025.07	66.4	-
Llama-3.2-3B 2026.04	65	-
Llama-3.2-3B 2026.04	65	-
Llama-3.2-3B 2026.04	65	-
DeepSeek-R1-Distill-Qwen-32B 2025.05	57.8	-
Pruner-Zero 2026.06	54.7	-
Llama-3.2-1B 2026.04	54	-
Llama-3.2-1B 2026.04	54	-
Llama-3.2-1B 2026.04	54	-
DOT-MoE 2026.06	53.9	-
DISP-LLM 2026.06	46.3	-
ShortGPT 2026.06	43.7	-
SparseGPT 2026.06	43.3	-
RoBERTa-L 2025.05	42	-
Wanda 2026.06	40.9	-
Glauber-M 2026.05	40.5	-
Original (uncompressed) 2026.05	40.41	52.02
LLM Surgeon 2026.06	40.3	-
GPT-2-M 2026.05	38.3	-
Mamba-2 2026.04	37.7	-
Mamba-2 + PoST 2026.04	37.5	-
Glauber-M 2026.05	37.4	-
(Gong et al., 2025)-M 2026.05	37.2	-
K-OBD 2026.06	36.8	-
SVD-LLM V1 2026.05	34.49	43.53
SliceGPT 2026.06	33	-
SVD-LLM V2 2026.05	32.4	40.29
RWKV-7 2026.04	32.1	-
RWKV-7 + PoST 2026.04	32.1	-
Gated DeltaNet 2026.04	31.9	-
Gated DeltaNet + PoST 2026.04	31.5	-
SEDD-M 2026.05	31.5	-
Mamba-2 + PoST 2026.04	31.3	-
Mamba-2 2026.04	31.1	-
Pure-bundle GBD (λ=0) 2026.05	27.85	30.33
Basis-Sharing-core 2026.05	26.18	26.76
CART 2026.05	-	26.49
CART 2026.05	-	27.1
CART 2026.05	-	27.05
CART 2026.05	-	28.04
Mamba2 2026.05	-	44.15
GLA 2026.05	-	41.93
GLA-Hedgehog 2026.05	-	40.48
Gated DeltaNet 2026.05	-	43.48
CCQ-GLA 2026.05	-	43.45
CCQ-Gated DeltaNet 2026.05	-	44.61
Transformer 2026.05	-	43.12
Mamba2 2026.05	-	57
GLA 2026.05	-	55.44
GLA-Hedgehog 2026.05	-	55.9
Gated DeltaNet 2026.05	-	56.66
CCQ-GLA 2026.05	-	59.14
CCQ-Gated DeltaNet 2026.05	-	56.82
Transformer 2026.05	-	53.44