Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Throughput on LLaMA-3 8B
Loading...
1,020
Decode Throughput (tok/s)
IQ3_S
458.4
604.2
750
895.8
Mar 30, 2026
Decode Throughput (tok/s)
Prefill Throughput (tok/s)
Speedup vs FP16
Updated 18d ago
Evaluation Results
Method
Method
Links
Decode Throughput (tok/s)
Prefill Throughput (tok/s)
Speedup vs FP16
IQ3_S
Hardware=RTX 5090, Bas...
2026.03
1,020
47,800
2.1
ITQ3_S
Hardware=RTX 5090, Bas...
2026.03
960
51,200
2
Q4_K_M
Hardware=RTX 5090, Bas...
2026.03
890
42,100
1.9
FP16
Hardware=RTX 5090, Bas...
2026.03
480
28,400
1
Feedback
Search any
task
Search any
task