| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| High-load generation workload w/ NVLink | DeInfer | TTFT (ms)443 | 27 | 1mo ago | |
| High-load generation workload w/o NVLink | DeInfer (w/ low-rank KV cache) | Time To First Token (ms)878 | 27 | 1mo ago | |
| AGENTPREFIX derived from τ-bench (trace) | ContextPilot | TTFT P50 (s)21.39 | 12 | 22d ago | |
| ToolBench | AugServe | Effective Throughput (req/s)1.11 | 9 | 3mo ago | |
| Merge dataset | AugServe | Effective Throughput (req/s)0.6 | 9 | 3mo ago | |
| Synthetic workloads (maximum serving capacity) | DuetServe | Throughput (req/s)12.8 | 6 | 1d ago | |
| LLaMA-2 70B chatbot workload | GPU-Only | TTFT (ms)45.2 | 4 | 3mo ago | |
| Azure-Conv | DUETSERVE | Throughput (req/s)8.02 | 2 | 1d ago | |
| Node multi-node inference workload | PALS | Efficiency Gain26.3 | 1 | 13d ago |