Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Inference Efficiency on 30k Context Length (Llama-3.1-8B)
Loading...
15.8
Inference Throughput (QPS)
Finetuning
0.72
4.635
8.55
12.465
Mar 11, 2025
Inference Throughput (QPS)
Setup Time (sec)
Updated 3d ago
Evaluation Results
Method
Method
Links
Inference Throughput (QPS)
Setup Time (sec)
Finetuning
Model=Llama-3.1-8B, GP...
2025.03
15.8
-
DBSA
Model=Llama-3.1-8B, GP...
2025.03
12.8
-
Fixed ICL
Caching Strategy=cache...
2025.03
11.6
-
RetICL
Caching Strategy=no ca...
2025.03
1.3
-
Feedback
Search any
task
Search any
task