Share your thoughts, 1 month free Claude Pro on usSee more

Inference Efficiency

Benchmarks

Dataset Name	SOTA Method	Metric
Natural Questions (NQ)	Perplexity	Relative Overhead (%)0.019	90	4mo ago
HumanEval	CATS w/ EAGLE	Speedup Factor5.33	90	2mo ago
DeepScaleR-40k (1,024 mathematical problems)	G-KV	Throughput (tokens/s)760.74	26	4mo ago
Samsung Galaxy S25 Qualcomm Snapdragon 8 Elite SoC inference v1.0	LFM2-350M	Prefill Throughput (1K) (tokens/s)1,067	20	4mo ago
ImageNet-1k	Gaussian	Inference Length-4.27	20	4mo ago
MS-COCO	Gaussian	Sequence Length Delta-18.33	20	4mo ago
Model Profiling	ESPACE	Total GEMM Latency (ms)15.9	19	4mo ago
General LLM Prompts	SelfCD	ATGR2.08	18	3mo ago
Synthetic Lego scene (test)	D-NeRF	Storage (MB)4	15	4mo ago
LLaMA2-7B 12/128 tokens	SWM (Ours)	Latency1.889	13	4mo ago
WQ	Perplexity	Relative Execution Time Overhead0.014	12	4mo ago
TQA	Perplexity	Relative Execution Time Overhead0.05	12	4mo ago
HotpotQA	SeleCom	Time to Last Token (ms)496	12	4mo ago
Inference Efficiency Evaluation	CS-LSTMs	Inference Latency (s)0.0046	12	4mo ago
1024-token sequences (inference summary)	Opir-edge	Throughput (Samples/s)499.49	11	1mo ago
Medical VQA benchmarks (PMC-VQA, PathVQA, SLAKE, VQA-RAD, Omni, MMMU-Med, MedX)	ViToS	Inference Time (min)11	10	24d ago
On-device Samsung Galaxy S25	LFM2 350M	Prefill TTFT (1k)0.84	10	4mo ago
LLaMA 8B 8K context length 3.1	SpecKV	Theoretical Compute (TFLOPs)159	10	4mo ago
openPangu Embedded Efficiency Benchmark	openPangu-Embedded	Prefill Latency (ms)528	10	4mo ago
MoE LLMs DSV2-16B, QW3-30B, QW3-80B-I	BITSMOE	Decode Speed (tokens/sec)12.46	9	1mo ago
Qwen2.5-7B		Throughput (tokens/s)1,480.2	9	4mo ago
HAGRID	SAM-Decoding[E2]	#MAT4.75	9	4mo ago
Generic Evaluation Dataset	PerCo	Encoding Time0.08	8	2mo ago
1x V100 (16GB) (synthetic)	SRM	Throughput (tokens/s)28,298	8	2mo ago
128K-context	IT-SPEED-16	TTFT101	8	2mo ago

Showing 25 of 59 rows