Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling Inference on Qwen2.5-7B 128K context length
Loading...
18.4
Decode Latency (ms/token)
FastMKA
16.804
27.577
38.35
49.123
Mar 21, 2026
Decode Latency (ms/token)
Speedup vs MLA
Updated 25d ago
Evaluation Results
Method
Method
Links
Decode Latency (ms/token)
Speedup vs MLA
FastMKA
Batch size=1, Precisio...
2026.03
18.4
1.78
MLA
Batch size=1, Precisio...
2026.03
32.7
-
GQA
Batch size=1, Precisio...
2026.03
49.8
-
MHA
Batch size=1, Precisio...
2026.03
58.3
-
Feedback
Search any
task
Search any
task