| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (NQ) | Perplexity | Relative Overhead (%)0.019 | 90 | 26d ago | |
| HumanEval | FLy | Speedup Factor5.15 | 54 | 1mo ago | |
| DeepScaleR-40k (1,024 mathematical problems) | G-KV | Throughput (tokens/s)760.74 | 26 | 1mo ago | |
| Samsung Galaxy S25 Qualcomm Snapdragon 8 Elite SoC inference v1.0 | LFM2-350M | Prefill Throughput (1K) (tokens/s)1,067 | 20 | 1mo ago | |
| ImageNet-1k | Gaussian | Inference Length-4.27 | 20 | 1mo ago | |
| MS-COCO | Gaussian | Sequence Length Delta-18.33 | 20 | 1mo ago | |
| Model Profiling | ESPACE | Total GEMM Latency (ms)15.9 | 19 | 1mo ago | |
| Synthetic Lego scene (test) | D-NeRF | Storage (MB)4 | 15 | 1mo ago | |
| LLaMA2-7B 12/128 tokens | SWM (Ours) | Latency1.889 | 13 | 29d ago | |
| WQ | Perplexity | Relative Execution Time Overhead0.014 | 12 | 26d ago | |
| TQA | Perplexity | Relative Execution Time Overhead0.05 | 12 | 26d ago | |
| HotpotQA | SeleCom | Time to Last Token (ms)496 | 12 | 1mo ago | |
| Inference Efficiency Evaluation | CS-LSTMs | Inference Latency (s)0.0046 | 12 | 1mo ago | |
| On-device Samsung Galaxy S25 | LFM2 350M | Prefill TTFT (1k)0.84 | 10 | 1mo ago | |
| LLaMA 8B 8K context length 3.1 | SpecKV | Theoretical Compute (TFLOPs)159 | 10 | 1mo ago | |
| openPangu Embedded Efficiency Benchmark | openPangu-Embedded | Prefill Latency (ms)528 | 10 | 1mo ago | |
| Qwen2.5-7B | Throughput (tokens/s)1,480.2 | 9 | 1mo ago | ||
| HAGRID | SAM-Decoding[E2] | #MAT4.75 | 9 | 1mo ago | |
| LLaVA 7B 1.5 | TrimTokenator-LC | Latency (ms)802.65 | 8 | 1mo ago | |
| 32k context length efficiency Llama-3-8B (test) | Time To First Token (s)4.12 | 7 | 1mo ago | ||
| KV Cache Efficiency | L2-7B (Base) | SKV Count1,966 | 7 | 1mo ago | |
| Inference Efficiency Benchmark | TTFT (ms)60 | 6 | 8d ago | ||
| LLaVA-NeXT Inference | Inference Time (s)7.998 | 6 | 1mo ago | ||
| LLaMA 8B 32K context length 3.1 | SpecKV | Theoretical Compute (TFLOPs)1,115 | 5 | 1mo ago | |
| Real-world Robotic Setup inference 1.0 (test) | DP3 | Action Expert Latency (ms)58.77 | 4 | 1mo ago |