Large Language Model Inference

Benchmarks

Dataset Name	SOTA Method	Metric
LLaMA-2 7B (inference)	JIT+CUDA	P99 Per-Token Latency (ms)8.23	33	2mo ago
Code Assistant	DReSD	TPS104	20	4mo ago
Decode Phase BS=1	BWTA (Bitnet-b1.58-2B)	Latency (s)0.152	18	3mo ago
FinanceQA	SpecBundle	Throughput1,779	18	4mo ago
GPQA	SpecBundle	Throughput2,341.3	18	4mo ago
Prefill Phase SeqLen=2k	BWTA (Bitnet-b1.58-2B)	Prefill Time (s)0.025	15	3mo ago
Held-out datasets chatbot_instruction_prompts and finance-alpaca (test)	Aurora (Qwen3-Coder-Next (FP8))	Throughput (TPS)265.7	14	4mo ago
Prompt length-1		TTFT0.05	8	1mo ago
Qwen2.5-7B (test)	TAQ-IS	Throughput37.29	7	2mo ago
8K context (test)		Q Score81.35	6	2mo ago
Qwen3-0.6B (inference)	EDGERAZOR	Storage (GB)0.255	6	2mo ago
Llama-3-8B	GPTQ	Throughput (Tok/s)130.7	5	1mo ago
Llama 3.2 1B		TPOTH1.94	4	4mo ago
Qwen VL 7B 2.5	LUQ	Throughput (Intel i7-13620H)9	3	1mo ago
Synthetic heavy-tail workload Pareto distribution	BatchLLM	Throughput (req/s)17.02	2	3mo ago
Llama 3.1	RoME	Latency (ms)48.1	2	3mo ago

Showing 16 of 16 rows