Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling Inference on Qwen2.5-7B (32K context length)

10.3Decoding Latency (ms/token)

FastMKA

Updated 2mo ago

Evaluation Results

Method	Links
FastMKA 2026.03		10.3	1.59
MLA 2026.03		16.4	-
GQA 2026.03		24.3	-
MHA 2026.03		28.6	-