Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Llama

Benchmarks

Task NameDataset NameSOTA ResultTrend
End-to-end throughputLLaMA-2-7B-Chat
Throughput (tokens/sec)449
60
LLM DecodingLlama 70B 3.1
Throughput3,119.55
48
Linear Layer Latency InferenceLlama-3-8B decoder block
Latency (µs)153
36
QuantizationLLAMA
Processing Time (hr)0.25
30
Attention Operator ThroughputLlama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1
TFLOPS225.3
28
LLM DecodingLlama 70B (H100 GPU Cluster) 3.1
Throughput894.32
27
Model RetrievalLlama-8B model tree (test)
Rank1
21
DecodingLlama 70B 3.1 (inference)
Throughput1,410.39
21
Throughput MeasurementLLaMA-2 13B
Throughput (tokens/s)19.4
20
Language ModelingLLaMA 13B 2
Perplexity (PPL)4.57
20
Fingerprint SimilarityLlama2 7B
Similarity Score0.995
17
Post-training Performance EvaluationLlama 3.2-3B (val)
Max Ppost70.1
15
OOD DetectionLLaMa 1 (test)
AUROC0.924
15
Hallucination DetectionLLaMa 1 (test)
AUROC0.894
15
QuantizationLLAMA v1 (train)
Processing Time (hr)0.25
15
Constrained LLM DecodingLlama-3-8B
Inference Time (ms)11.77
14
Watermark DetectionLlama-3-8B-Instruct 150 tokens (generations)
Mean P9
13
Watermark DetectionLlama-3 8B Instruct 30 tokens (generations)
Mean Precision23
13
LLM GenerationLlama-2-7B
Latency (LAN)22.1
12
LLM Training OptimizationLlama 3.2 3B
Training Time Reduction (%)12.3
12
Language ModelingLLaMA Pretraining
Final Loss2.74
12
Large Language Model Training EfficiencyLlama 1.7B 3.2
Energy Reduction (Iso-Time)28.3
11
Language ModelingLLaMA-2-7B
Perplexity5.47
11
Language ModelingLLaMA-350M pre-training (val)
Validation Loss2.707
10
INT2 QuantizationLLaMA-30B
Memory Footprint (GB)18.59
10
Showing 25 of 86 rows