Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA LLM Inference benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
LLM Inference
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Aggregate Mean over Alpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, MT-Bench
DART
Mean Speedup
2.87
21
1mo ago
MBPP
DART
Speedup
3.09
21
1mo ago
Math500
DART
Speedup
2.84
21
1mo ago
LiveCodeBench
DART
Speedup
2.81
21
1mo ago
Alpaca
DART
Speedup
2.95
21
1mo ago
Transformers (PyTorch) workflow Qwen3 family (inference)
CryptoTensors
Model Load Time (s)
1.16
18
1mo ago
ToolBench
AugServe
Goodput (req/s)
3.9
18
1mo ago
Merge
AugServe
Goodput (req/s)
1.16
18
1mo ago
ToolBench dataset
AugServe
SLO Attainment
100
9
1mo ago
Merge dataset
AugServe
SLO Attainment
54.3
9
1mo ago
Long-Context LLM Inference Decode
Reuse
Latency (ms)
0.13
8
1mo ago
Alpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, and MT-Bench
DART
Speedup (Alpaca)
2.61
8
1mo ago
Qwen 0.5B Instruct 2.5
CUDA (compiled, RTX 5090)
Throughput (Tok/s)
185.5
6
13d ago
WebLLM macOS Apple M2 16GB unified memory
Qwen2.5-0.5B
Decode Throughput (tok/s)
46.4
6
13d ago
Long-Context LLM Inference (Prefill)
Kascade
Prefill Latency (ms)
0.62
6
1mo ago
LLaMA-2 70B sequence length 2048
CXL-SpecKV + Comp
Max Batch Size
384
5
1mo ago
Qwen 1.5B Instruct 2.5
CUDA (eager, RTX 5090)
Throughput (tok/s)
155.3
4
13d ago
WebLLM Windows 11 RTX PRO 2000 Blackwell 8GB
Qwen2.5-0.5B
Decode Throughput (tok/s)
51.1
4
13d ago
LLaMA2 7B
Baseline
Latency (ms)
1,052.24
4
1mo ago
Qwen2.5 0.5B
Baseline
Latency (ms)
139.98
4
1mo ago
OPT-125M
Baseline
Latency (ms)
46.71
4
1mo ago
Showing 21 of 21 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs