Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Model Inference on Llama-2 7B-Chat
Loading...
76
Latency (ms/token)
ARC engine
29.04
346.02
663
979.98
Mar 26, 2026
Latency (ms/token)
Updated 23d ago
Evaluation Results
Method
Method
Links
Latency (ms/token)
ARC engine
Hardware Backend=GPU (...
2026.03
76
ARC engine
Hardware Backend=CPU (...
2026.03
139
Candle Q4 float
Hardware Backend=M2 Ul...
2026.03
175
Candle Q4 float
Hardware Backend=Vultr...
2026.03
1,250
Feedback
Search any
task
Search any
task