Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling Inference on Qwen2.5-7B (4K Context Length)

6.2Decode Latency (ms/token)

FastMKA

Updated 4mo ago

Evaluation Results

Method	Links
FastMKA 2026.03		6.2	1.4
MLA 2026.03		8.7	-
GQA 2026.03		12.4	-
MHA 2026.03		14.2	-