| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Qwen3 Models (test) | AWQ | Throughput (k tokens/sec)120.62 | 30 | 26d ago | |
| Qwen3 Query Projection Module NVIDIA A40 | TTQ (r = 0) | Throughput (k tokens/sec)80.63 | 30 | 26d ago | |
| Qwen3-8B | ROCKET | Throughput (tokens/s)26.74 | 9 | 1mo ago | |
| Llama Instruct 3.1-8B (internal harness) | Pre-compressed only | Throughput (TPS)6,991 | 8 | 1mo ago | |
| AIME24/25 | Mix-RL-4B | Throughput (token/s)5,888 | 6 | 1mo ago | |
| Inference Throughput Benchmark H200 GPU | Surefire-1B | Throughput (2k Input)13,890 | 5 | 1mo ago | |
| LLaMA-3 8B | Decode Throughput (tok/s)1,020 | 4 | 18d ago | ||
| 7-layer 512 x 512 MLP | AIE4ML | Throughput (TOPS)113.4 | 4 | 1mo ago | |
| Llama-8B | GPTQ | Throughput (Tokens/s)115.2 | 4 | 1mo ago | |
| Llama-3B | GPTQ | Throughput (TOK/s)215.6 | 4 | 1mo ago | |
| Llama-1B | GPTQ | Throughput (Tokens/sec)310.5 | 4 | 1mo ago | |
| 64K scenario | gpt-oss-puzzle-88B | Speedup Ratio1.4 | 1 | 1mo ago |