Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Efficiency on 32k Context Length Llama-3-8B (test)
Loading...
4.12
Time To First Token (s)
Full Model
4.1132
4.1591
4.205
4.2509
Mar 15, 2026
Time To First Token (s)
Time Per Output Token (s)
Memory Usage (GB)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Time To First Token (s)
Time Per Output Token (s)
Memory Usage (GB)
Full Model
Model=Llama-3-8B, Cont...
2026.03
4.12
0.081
24.27
StreamingLLM
Model=Llama-3-8B, Cont...
2026.03
4.12
0.032
16.12
H2O
Model=Llama-3-8B, Cont...
2026.03
4.12
0.033
16.58
SnapKV
Model=Llama-3-8B, Cont...
2026.03
4.13
0.032
16.37
SemantiCache
Model=Llama-3-8B, Cont...
2026.03
4.25
0.031
15.94
CaM
Model=Llama-3-8B, Cont...
2026.03
4.28
0.039
17.03
D2O
Model=Llama-3-8B, Cont...
2026.03
4.29
0.038
16.91
Feedback
Search any
task
Search any
task