Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling Inference on Qwen2.5-7B (32K context length)
Loading...
10.3
Decoding Latency (ms/token)
FastMKA
9.568
14.509
19.45
24.391
Mar 21, 2026
Decoding Latency (ms/token)
Speedup vs MLA
Updated 25d ago
Evaluation Results
Method
Method
Links
Decoding Latency (ms/token)
Speedup vs MLA
FastMKA
Batch size=1, Precisio...
2026.03
10.3
1.59
MLA
Batch size=1, Precisio...
2026.03
16.4
-
GQA
Batch size=1, Precisio...
2026.03
24.3
-
MHA
Batch size=1, Precisio...
2026.03
28.6
-
Feedback
Search any
task
Search any
task