Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Model Inference Efficiency on Meta-Llama-3-8B
Loading...
1,398
Throughput (tokens/s)
AWQ
835.984
981.892
1,127.8
1,273.708
Feb 27, 2026
Throughput (tokens/s)
Latency (ms/token)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Throughput (tokens/s)
Latency (ms/token)
AWQ
Quantization=4-bit
2026.02
1,398
0.715
DACQ Hybrid
Quantization=4-bit
2026.02
1,022.5
0.978
DACQ Logistic
Quantization=4-bit
2026.02
975.05
1.025
Unquantized
Precision=FP16
2026.02
857.6
1.166
Feedback
Search any
task
Search any
task