Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Inference Efficiency on 30k Context Length Llama-2-7B
Loading...
6.6
Inference Throughput (QPS)
Finetuning
0.568
2.134
3.7
5.266
Mar 11, 2025
Inference Throughput (QPS)
Setup Time (sec)
Setup Time (Relative)
Inference Latency (Relative)
Updated 3d ago
Evaluation Results
Method
Method
Links
Inference Throughput (QPS)
Setup Time (sec)
Setup Time (Relative)
Inference Latency (Relative)
Finetuning
Model=Llama-2-7B, GPU=...
2025.03
6.6
-
-
-
DBSA
Model=Llama-2-7B, GPU=...
2025.03
3.4
-
-
-
Fixed ICL
Caching Strategy=cache...
2025.03
1.5
-
-
-
RetICL
Caching Strategy=no ca...
2025.03
0.8
-
-
-
RetICL
Model=Llama-2-7B, Hard...
2025.03
-
-
1
1
Fixed ICL
Model=Llama-2-7B, Hard...
2025.03
-
-
4.5
0.51
Finetuning
Model=Llama-2-7B, Hard...
2025.03
-
-
-
0.12
DBSA
Model=Llama-2-7B, Hard...
2025.03
-
-
3
0.22
Feedback
Search any
task
Search any
task