Share your thoughts, 1 month free Claude Pro on usSee more

Zero-shot Commonsense Reasoning on Commonsense Reasoning Benchmarks (LLaMA-2-13B)

80.92BoolQ Accuracy (Zero-shot)

Dense

Updated 5mo ago

Evaluation Results

Method	Links
Dense 2025.12		80.92	80.52	79.36	71.98	79.63	49.15	45	69.51
SlimGPT w/o 2025.12		80.37	78.45	77.07	71.51	75.84	45.14	43.6	67.43
Token Filtering 2025.12		80.18	78.62	77.78	72.22	78.96	48.04	45	68.69
Token Filtering 2025.12		80.09	78.56	77.82	71.03	77.57	47.95	44.2	68.17
Token Filtering 2025.12		79.79	77.15	76.07	71.35	77.44	47.35	44	67.65
Token Filtering 2025.12		79.76	77.04	74.56	71.03	71.72	45.22	42	65.9
SlimGPT w/o 2025.12		78.78	77.91	75.65	70.64	73.06	44.11	43	66.16
FLAP 2025.12		77.21	77.95	78.06	71.32	76.44	45.27	42	66.89
PP 2025.12		76.29	79.55	76.53	68.57	76.85	44.57	42	66.33
SlimGPT w/o 2025.12		76.27	76.44	72.76	70.8	70.83	41.21	41	64.19
FLAP 2025.12		75.81	75.61	74.74	69.59	73.83	43.57	41	64.88
FLAP 2025.12		74.37	74.42	70.45	68.17	70.22	42.29	39.2	62.73
FLAP 2025.12		74.31	70.49	58.39	62.96	61.67	36.83	37.2	57.41
PP 2025.12		73.82	78.47	75.82	67.78	75.42	42.95	42.4	65.24
PP 2025.12		70.3	76.74	71.67	63.65	71.08	39.96	42.6	62.29
SlimGPT w/o 2025.12		66.06	73.61	61.76	65.82	60.44	34.39	38	57.15
PP 2025.12		62.17	69.22	49.88	55.07	59.26	29.63	36.2	51.93