Share your thoughts, 1 month free Claude Pro on usSee more

Zero-shot Language Evaluation on Gauntlet 20 benchmarks (test)

9.2Average Normalized Accuracy

Prior-based

Updated 4mo ago

Evaluation Results

Method	Links
Prior-based 2025.09		9.2	9.53	11.27	10.31	11.13	3.79
PPL-based 2025.09		8.22	9.98	11.91	7.34	7.91	3.96
DSIR 2025.09		7.56	7.03	6.84	7.31	12.67	3.97
FastText 2025.09		7.09	6.71	6.11	6.89	11.93	3.82
Prior-based 2025.09		6.65	5.03	9.13	4.22	11.21	3.66
vanilla 2025.09		5.78	5.52	0.44	6.14	13.22	3.59
DSIR 2025.09		5.6	5.68	4.93	1.97	11.6	3.8
FastText 2025.09		5.39	5.12	4.29	1.74	12.31	3.49
PPL-based 2025.09		5.26	5.47	6.53	2.9	7.84	3.58
vanilla 2025.09		4.96	4.96	1.81	1.47	12.83	3.7