Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling Inference on Qwen2.5-7B (64K Context Length)

13.6Decode Latency (ms/token)

FastMKA

Updated 4mo ago

Evaluation Results

Method	Links
FastMKA 2026.03		13.6	1.68
MLA 2026.03		22.8	-
GQA 2026.03		33.4	-
MHA 2026.03		39.2	-