Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Throughput on Llama Instruct 3.1-8B (internal harness)
Loading...
6,991
Throughput (TPS)
Pre-compressed only
784.28
2,395.64
4,007
5,618.36
Mar 18, 2026
Throughput (TPS)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Throughput (TPS)
Pre-compressed only
Method=Pre-compression
2026.03
6,991
Pre-comp. + clustering
Method=Pre-comp. + K-m...
2026.03
6,872
AWQ INT4
Method=Uniform quant.
2026.03
4,606
FP8
Method=Uniform quant.
2026.03
3,316
Baseline (FP16)
Method=Full precision
2026.03
2,194
Clustering (Orig)
Method=K-means (rebuil...
2026.03
2,194
GPTQ INT4
Method=Uniform quant.
2026.03
1,812
AQLM 2-bit
Method=Codebook
2026.03
1,023
Feedback
Search any
task
Search any
task