| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Qwen3 Models (test) | AWQ | Throughput (k tokens/sec)120.62 | 30 | 2mo ago | |
| Qwen3 Query Projection Module NVIDIA A40 | TTQ (r = 0) | Throughput (k tokens/sec)80.63 | 30 | 2mo ago | |
| GPT-J 6B | GPT-J-6B (Proposed Method) | Inference Throughput (tokens/s)1,484 | 16 | 19d ago | |
| Qwen3-8B | ROCKET | Throughput (tokens/s)26.74 | 9 | 3mo ago | |
| Llama Instruct 3.1-8B (internal harness) | Pre-compressed only | Throughput (TPS)6,991 | 8 | 2mo ago | |
| BERT, GPT-2, and OPT Inference Workload BS=2, SL=256 | BERT | Original Throughput (tokens/s)57,117.4 | 6 | 19d ago | |
| AIME24/25 | Mix-RL-4B | Throughput (token/s)5,888 | 6 | 3mo ago | |
| Inference Throughput Benchmark H200 GPU | Surefire-1B | Throughput (2k Input)13,890 | 5 | 3mo ago | |
| LLaMA-3 8B | Decode Throughput (tok/s)1,020 | 4 | 2mo ago | ||
| 7-layer 512 x 512 MLP | AIE4ML | Throughput (TOPS)113.4 | 4 | 3mo ago | |
| Llama-8B | GPTQ | Throughput (Tokens/s)115.2 | 4 | 3mo ago | |
| Llama-3B | GPTQ | Throughput (TOK/s)215.6 | 4 | 3mo ago | |
| Llama-1B | GPTQ | Throughput (Tokens/sec)310.5 | 4 | 3mo ago | |
| ISL 8192 OSL 16384 workload | Nano v3 12BA2A | Max Batch Size224 | 3 | 23d ago | |
| 64K scenario | gpt-oss-puzzle-88B | Speedup Ratio1.4 | 1 | 3mo ago |