Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling Inference on Qwen2.5-7B (8K context length)

7.1Decode Latency (ms/token)

FastMKA

Updated 4mo ago

Evaluation Results

Method	Links
FastMKA 2026.03		7.1	1.44
MLA 2026.03		10.2	-
GQA 2026.03		14.8	-
MHA 2026.03		16.8	-