Inference Throughput

Benchmarks

Dataset Name	SOTA Method	Metric
Qwen3 Models (test)	AWQ	Throughput (k tokens/sec)120.62	30	4mo ago
Qwen3 Query Projection Module NVIDIA A40	TTQ (r = 0)	Throughput (k tokens/sec)80.63	30	4mo ago
SAMSum	BUDDY	Prefill (tokens/s)6,581.55	18	1mo ago
GPT-J 6B	GPT-J-6B (Proposed Method)	Inference Throughput (tokens/s)1,484	16	2mo ago
64K scenario	SparDA	Prefill Throughput (tok/s)8,191.3	10	1mo ago
Qwen3-8B	ROCKET	Throughput (tokens/s)26.74	9	4mo ago
Llama Instruct 3.1-8B (internal harness)	Pre-compressed only	Throughput (TPS)6,991	8	4mo ago
BERT, GPT-2, and OPT Inference Workload BS=2, SL=256	BERT	Original Throughput (tokens/s)57,117.4	6	2mo ago
AIME24/25	Mix-RL-4B	Throughput (token/s)5,888	6	4mo ago
Alpaca	BUDDY	Prefill Throughput (tokens/s)3,348.18	5	1mo ago
Inference Throughput Benchmark H200 GPU	Surefire-1B	Throughput (2k Input)13,890	5	4mo ago
700M random-initialized models	BDLM attention	Throughput (Seq Len 256)1,935	4	18d ago
LLaMA-3 8B		Decode Throughput (tok/s)1,020	4	3mo ago
7-layer 512 x 512 MLP	AIE4ML	Throughput (TOPS)113.4	4	4mo ago
Llama-8B	GPTQ	Throughput (Tokens/s)115.2	4	4mo ago
Llama-3B	GPTQ	Throughput (TOK/s)215.6	4	4mo ago
Llama-1B	GPTQ	Throughput (Tokens/sec)310.5	4	4mo ago
ISL 8192 OSL 16384 workload	Nano v3 12BA2A	Max Batch Size224	3	2mo ago

Showing 18 of 18 rows