Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Inference Efficiency on 90k Context Length Llama-3.1-8B
Loading...
8.9
Throughput (queries/s)
Finetuning
0.06
2.355
4.65
6.945
Mar 11, 2025
Throughput (queries/s)
Setup Time (sec)
Setup Time (Relative)
Inference Latency
Updated 3d ago
Evaluation Results
Method
Method
Links
Throughput (queries/s)
Setup Time (sec)
Setup Time (Relative)
Inference Latency
Finetuning
Model=Llama-3.1-8B, GP...
2025.03
8.9
-
-
-
DBSA
Model=Llama-3.1-8B, GP...
2025.03
7.7
-
-
-
Fixed ICL
Caching Strategy=cache...
2025.03
6.8
-
-
-
RetICL
Caching Strategy=no ca...
2025.03
0.4
-
-
-
RetICL
Model=Llama-3.1-8B, Ha...
2025.03
-
-
1
1
Fixed ICL
Model=Llama-3.1-8B, Ha...
2025.03
-
-
6.5
0.06
Finetuning
Model=Llama-3.1-8B, Ha...
2025.03
-
-
-
0.046
DBSA
Model=Llama-3.1-8B, Ha...
2025.03
-
-
4
0.053
Feedback
Search any
task
Search any
task