Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Latency Measurement on LLaMA-13B (13824 × 5120 linear layer)
Loading...
0.0499
Latency (ms)
MBOK
-0.156612
1.237144
2.6309
4.024656
May 28, 2025
Latency (ms)
Speed-up Ratio
Updated 1mo ago
Evaluation Results
Method
Method
Links
Latency (ms)
Speed-up Ratio
MBOK
Batch Size=1, Hardware...
2025.05
0.0499
8.7
FP16
Batch Size=1, Hardware...
2025.05
0.4341
-
QUIP#
Batch Size=1, Hardware...
2025.05
0.6284
0.69
QTIP
Batch Size=1, Hardware...
2025.05
5.2119
0.08
Feedback
Search any
task
Search any
task