| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Dot Product | CUDA-LLM task suite | Execution Time (ms)3.99 | 9 | |
| Reduction | CUDA-LLM task suite | Time5.75 | 9 | |
| Matrix Copy | CUDA-LLM task suite | Execution Time5.12 | 9 | |
| ReLU Activation Function | CUDA-LLM task suite | Time4.49 | 9 | |
| Reverse Array | CUDA-LLM task suite | Execution Time4.07 | 9 | |
| Matrix Transpose | CUDA-LLM task suite | Time5.2 | 9 | |
| Top-K Selection | CUDA-LLM task suite | Time5.9 | 5 | |
| Histogramming | CUDA-LLM task suite | Latency (ms)7.55 | 5 | |
| Monte Carlo Integration | CUDA-LLM task suite | Latency (ms)6.95 | 5 | |
| Categorical Cross-Entropy Loss | CUDA-LLM task suite | Time (ms)6.4 | 5 | |
| Prefix Sum | CUDA-LLM task suite | Time6.1 | 5 | |
| Categorical Cross-Entropy Loss | CUDA-LLM kernels task suite (test) | Time (s)- | 0 | |
| Multi-Head Self-Attention | CUDA-LLM kernels task suite (test) | Latency (s)- | 0 | |
| Dot Product | CUDA-LLM task suite (test) | Latency- | 0 | |
| Reduction | CUDA-LLM task suite (test) | Time- | 0 | |
| Matrix Copy | CUDA-LLM task suite (test) | Latency (ms)- | 0 | |
| Reverse Array | CUDA-LLM task suite (test) | Time (ms)- | 0 | |
| Matrix Transpose | CUDA-LLM task suite (test) | Latency (ms)- | 0 | |
| Ordinary Least Squares Regression | CUDA-LLM task suite | Time- | 0 |