Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Serving on Llama 3.1-8B-Instruct (512->128 tokens, concurrency=1)
Loading...
269
Throughput (Tok/s)
Pre-comp. clustering rebuild
3.8
72.65
141.5
210.35
Mar 18, 2026
Throughput (Tok/s)
TTFT (ms)
TPOT (ms)
Avg Power (W)
TFLOPs
Storage (GB)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Throughput (Tok/s)
TTFT (ms)
TPOT (ms)
Avg Power (W)
TFLOPs
Storage (GB)
Pre-comp. clustering rebuild
Note=Best trade-off
2026.03
269
17
3.61
186
0.87
3.1
Pre-compressed
Note=Best throughput
2026.03
259
18.6
3.74
188
0.83
6.1
Orig clustering rebuild
Note=Storage reduction...
2026.03
157
25.6
6.21
232
0.51
6.9
Original
Note=Baseline
2026.03
155
25.1
6.3
266
0.5
15
Pre-comp. LUT
Note=Kernel-bound
2026.03
34
135
28.2
166
0.11
3
Orig LUT
Note=Kernel-bound
2026.03
14
373
67
177
0.05
6.5
Feedback
Search any
task
Search any
task