Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Linear Kernel Performance on 256x256x1 Matrix Multiplication NVIDIA H800 GPU (test)

5.54Latency (us)

BWTA_QK

3.348418.141732.93547.7283Apr 5, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.04
5.545,915
2026.04
6.714,883
2026.04
6.974,701
2026.04
7.294,495
2026.04
8.563,828
2026.04
9.083,609
2026.04
10.743,050
2026.04
60.33543