Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling Inference on Qwen2.5-7B 128K context length

18.4Decode Latency (ms/token)

FastMKA

Updated 2mo ago

Evaluation Results

Method	Links
FastMKA 2026.03		18.4	1.78
MLA 2026.03		32.7	-
GQA 2026.03		49.8	-
MHA 2026.03		58.3	-