Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA LLM Inference benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
LLM Inference
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Alpaca
CATS w/ EAGLE
Speedup
5.56
57
21d ago
LLaMA2 7B
JIT+CUDA
TTFT (ms)
11.09
33
1mo ago
Qwen3-8B 2k prompts Decode-heavy workload
EB+
Throughput (tok/s)
46,782
30
1d ago
Qwen3-8B 2k prompts Balanced workload
v1
Throughput (tok/s)
61,106
28
1d ago
Qwen3-8B 2k prompts Prefill-heavy workload
v1
Throughput (tok/s)
106,324
26
1d ago
MBPP sanitized
TIDE
Throughput (tokens/s)
2.44
24
14d ago
Aggregate Mean over Alpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, MT-Bench
DART
Mean Speedup
2.87
21
3mo ago
MBPP
DART
Speedup
3.09
21
3mo ago
Math500
DART
Speedup
2.84
21
3mo ago
LiveCodeBench
DART
Speedup
2.81
21
3mo ago
Qwen3-8B workload (c=512) on RTX PRO 6000
EB+
SLO Attainment (%)
80.3
18
1d ago
Transformers (PyTorch) workflow Qwen3 family (inference)
CryptoTensors
Model Load Time (s)
1.16
18
3mo ago
ToolBench
AugServe
Goodput (req/s)
3.9
18
3mo ago
Merge
AugServe
Goodput (req/s)
1.16
18
3mo ago
Qwen3-8B synthetic workload (mu_L=512, mu_O=256)
EB+
Throughput (tok/s)
48,015
16
1d ago
LLaMA-7B v1 (serving)
FlashSVD v1.5
Decode Latency (ms/token)
12.16
16
22d ago
Llama 3.2 Samsung Galaxy S25 Ultra 1B (test)
ET QNN
Prefill Min Throughput (tokens/sec)
2,813.19
13
22d ago
Qwen3 Samsung Galaxy S25 Ultra 0.6B (test)
llama.cpp
Prefill Throughput (min)
1,709.9
12
22d ago
Meta-Llama-3-8B prompt/gen = 128 tokens
AWQ (4-bit)
Memory Usage (MB)
5,463
12
27d ago
WildChat
v1
RPS (Requests/s)
75.17
11
1d ago
Llama 3.2 Google Pixel 9 Pro XL 1B (test)
ET Vulkan
Prefill Throughput (min) (tokens/sec)
530.02
10
22d ago
Qwen3 Google Pixel 9 Pro XL 0.6B (test)
ET Vulkan
Prefill Throughput (min, tokens/sec)
591.01
10
22d ago
Phi4 Mini Samsung Galaxy S25 Ultra 3.8B (test)
ET QNN
Prefill Throughput (min, tokens/sec)
1,161.29
10
22d ago
ToolBench dataset
AugServe
SLO Attainment
100
9
3mo ago
Merge dataset
AugServe
SLO Attainment
54.3
9
3mo ago
Showing 25 of 44 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs