Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sparse Decoding (MHA) on Synthetic 128K Context (H100, FP16)
Loading...
0.7
FlashInfer Latency (ms)
FlashInfer
0.2808
3.1104
5.94
8.7696
May 22, 2026
FlashInfer Latency (ms)
Speedup Factor (2x)
Speedup Factor (5x)
Speedup Factor (10x)
Speedup Factor (20x)
Speedup Factor (50x)
Speedup Factor (100x)
Updated 8d ago
Evaluation Results
Method
Method
Links
FlashInfer Latency (ms)
Speedup Factor (2x)
Speedup Factor (5x)
Speedup Factor (10x)
Speedup Factor (20x)
Speedup Factor (50x)
Speedup Factor (100x)
FlashInfer
Batch size (B)=1, Atte...
2026.05
0.7
-
-
-
-
-
-
FlashInfer
Batch size (B)=4, Atte...
2026.05
2.79
-
-
-
-
-
-
FlashInfer
Batch size (B)=8, Atte...
2026.05
5.61
-
-
-
-
-
-
FlashInfer
Batch size (B)=16, Att...
2026.05
11.18
-
-
-
-
-
-
Sparse Decode (Double Sparsity)
Batch size (B)=1, Atte...
2026.05
-
0.91
1.62
2.18
2.68
3.14
3.37
Sparse Decode (Double Sparsity)
Batch size (B)=4, Atte...
2026.05
-
1.02
1.86
2.56
3.17
3.74
4
Sparse Decode (Double Sparsity)
Batch size (B)=8, Atte...
2026.05
-
1.12
2
2.71
3.32
3.86
4.11
Sparse Decode (Double Sparsity)
Batch size (B)=16, Att...
2026.05
-
1.22
2.13
2.84
3.43
3.94
4.17
Feedback
Search any
task
Search any
task