Share your thoughts, 1 month free Claude Pro on usSee more

Long-context language modeling evaluation on RULER 32K

88.6Average Score (RULER 32K)

Full KV

Updated 1mo ago

Evaluation Results

Method	Links
Full KV 2025.10		88.6	-	-	-	-	-	-	-	-	-
AdaKV + OBCACHE-K 2025.10		88.5	-	-	-	-	-	-	-	-	-
AdaKV + OBCACHE-V&K 2025.10		85.2	-	-	-	-	-	-	-	-	-
Full 2026.03		75.4	99.6	98	98	97	46.6	94.15	22.28	69.78	53.2
AdaKV + OBCACHE-V&K 2025.10		75.4	-	-	-	-	-	-	-	-	-
MixedDimKV-H 2026.03		62.64	100	95.2	96	38.4	31.05	64.35	21.6	65.8	51.4
HeadKV 2026.03		60.43	100	90.6	92	44	24.8	56.5	20.92	64.27	50.8
MixedDimKV 2026.03		49.84	98.4	85.6	92.4	8.4	7.75	17.1	21.24	66.47	51.2
SnapKV 2026.03		49.16	100	85.8	89.8	13.6	7.25	16.5	20.88	62.77	45.8
PyramidKV 2026.03		45.31	100	80.6	78.8	14.4	3.69	5.05	20.48	60.27	44.6
H2O 2025.10		21.1	-	-	-	-	-	-	-	-	-
H2O 2026.03		5.85	0.8	0	0	0	0	0	4.04	24.2	23.6