Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Matrix Multiplication on Synthetic Transformer Shapes Query-Key Q ⊗ K⊤
Loading...
5.41
Latency (µs)
BWTA_QK
-107.7336
655.9857
1,419.705
2,183.4243
Apr 5, 2026
Latency (µs)
Updated 11d ago
Evaluation Results
Method
Method
Links
Latency (µs)
BWTA_QK
Shape=Q: [128, 128], K...
2026.04
5.41
BWTA_QK
Shape=Q: [512, 512], K...
2026.04
8.96
FP16
Shape=Q: [128, 128], K...
2026.04
39.31
cuBLAS
Shape=Q: [128, 128], K...
2026.04
52.56
cuBLAS
Shape=Q: [512, 512], K...
2026.04
58.23
BWTA_QK
Shape=Q: [2048, 2048],...
2026.04
144.1
FP16
Shape=Q: [512, 512], K...
2026.04
145.7
cuBLAS
Shape=Q: [2048, 2048],...
2026.04
606.5
FP16
Shape=Q: [2048, 2048],...
2026.04
2,834
Feedback
Search any
task
Search any
task