Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sparse Decoding (GQA) on Synthetic 128K context (test)
Loading...
0.19
FlashInfer Latency (ms)
FlashInfer
0.0744
0.8547
1.635
2.4153
May 22, 2026
FlashInfer Latency (ms)
Speedup (2x)
Speedup (5x)
Speedup (10x)
Speedup (20x)
Speedup (50x)
Speedup (100x)
Updated 8d ago
Evaluation Results
Method
Method
Links
FlashInfer Latency (ms)
Speedup (2x)
Speedup (5x)
Speedup (10x)
Speedup (20x)
Speedup (50x)
Speedup (100x)
FlashInfer
Batch size (B)=1, Atte...
2026.05
0.19
-
-
-
-
-
-
FlashInfer
Batch size (B)=4, Atte...
2026.05
0.72
-
-
-
-
-
-
FlashInfer
Batch size (B)=8, Atte...
2026.05
1.5
-
-
-
-
-
-
FlashInfer
Batch size (B)=16, Att...
2026.05
3.08
-
-
-
-
-
-
Sparse Decode (Double Sparsity)
Batch size (B)=1, Atte...
2026.05
-
0.28
0.56
0.83
1.12
1.46
1.65
Sparse Decode (Double Sparsity)
Batch size (B)=4, Atte...
2026.05
-
0.32
0.67
1.06
1.52
2.11
2.45
Sparse Decode (Double Sparsity)
Batch size (B)=8, Atte...
2026.05
-
0.36
0.75
1.18
1.68
2.3
2.66
Sparse Decode (Double Sparsity)
Batch size (B)=16, Att...
2026.05
-
0.41
0.85
1.31
1.82
2.46
2.81
Feedback
Search any
task
Search any
task