Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Llama

Benchmarks

Task NameDataset NameSOTA ResultTrend
End-to-end throughputLLaMA-2-7B-Chat
Throughput (tokens/sec)449
60
LLM DecodingLlama 70B 3.1
Throughput3,119.55
48
Linear Layer Latency InferenceLlama-3-8B decoder block
Latency (µs)153
36
QuantizationLLAMA
Processing Time (hr)0.25
30
Time to First TokenLlama 3.1 8B Q4 weights
TTFT (ms)204
28
Attention Operator ThroughputLlama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1
TFLOPS225.3
28
LLM DecodingLlama 70B (H100 GPU Cluster) 3.1
Throughput894.32
27
Fingerprint SimilarityLlama2 7B
Similarity Score1
24
Model RetrievalLlama-8B model tree (test)
Rank1
21
DecodingLlama 70B 3.1 (inference)
Throughput1,410.39
21
Jailbreak AttackLlama 8B 3.1
NR Rate96
20
Throughput MeasurementLLaMA-2 13B
Throughput (tokens/s)19.4
20
Language ModelingLLaMA 13B 2
Perplexity (PPL)4.57
20
Representation Injection PerformanceLlama2-7B evaluation scenarios (test)
Accuracy85.16
18
Language ModelingLLaMA-2-7B
Perplexity5.47
18
Post-training Performance EvaluationLlama 3.2-3B (val)
Max Ppost70.1
15
OOD DetectionLLaMa 1 (test)
AUROC0.924
15
Hallucination DetectionLLaMa 1 (test)
AUROC0.894
15
QuantizationLLAMA v1 (train)
Processing Time (hr)0.25
15
Constrained LLM DecodingLlama-3-8B
Inference Time (ms)11.77
14
Watermark DetectionLlama-3-8B-Instruct 150 tokens (generations)
Mean P9
13
Watermark DetectionLlama-3 8B Instruct 30 tokens (generations)
Mean Precision23
13
Knowledge EditingLLaMA-3
Average Runtime8.1
12
Latency MeasurementLLaMA linear layers 13B (inference)
Latency (ms)0.0499
12
LLM GenerationLlama-2-7B
Latency (LAN)22.1
12
Showing 25 of 137 rows