Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Llama

Benchmarks

Task NameDataset NameSOTA ResultTrend
Attention Operator ThroughputLlama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1
TFLOPS615.39
62
End-to-end throughputLLaMA-2-7B-Chat
Throughput (tokens/sec)449
60
Latency MeasurementLLaMA-8B-Instruct Chunked Prefill 3.1 (inference)
Attention Latency (ms)423.1
49
LLM DecodingLlama 70B 3.1
Throughput3,119.55
48
Negative SentimentLlama-3-8B n≈200
ASR77
42
Linear Layer Latency InferenceLlama-3-8B decoder block
Latency (µs)153
36
Large Language Model InferenceLLaMA-2 7B (inference)
P99 Per-Token Latency (ms)8.23
33
QuantizationLLAMA
Processing Time (hr)0.25
30
Time to First TokenLlama 3.1 8B Q4 weights
TTFT (ms)204
28
LLM DecodingLlama 70B (H100 GPU Cluster) 3.1
Throughput894.32
27
Fingerprint SimilarityLlama2 7B
Similarity Score1
24
Model RetrievalLlama-8B model tree (test)
Rank1
21
DecodingLlama 70B 3.1 (inference)
Throughput1,410.39
21
Jailbreak AttackLlama 8B 3.1
NR Rate96
20
Throughput MeasurementLLaMA-2 13B
Throughput (tokens/s)19.4
20
Language ModelingLLaMA 13B 2
Perplexity (PPL)4.57
20
Persona DiscoveryLlama-3.1-70B Large Target
Similarity Score98
18
Persona DiscoveryLlama 8B Small Target 3.1
Similarity Score0.97
18
Representation Injection PerformanceLlama2-7B evaluation scenarios (test)
Accuracy85.16
18
Language ModelingLLaMA-2-7B
Perplexity5.47
18
Jailbreak Attack TransferabilityLlama-3B Target Transferability set
ASR81
17
Jailbreak AttackLlama 7B 2
ASR97
17
Persona DiscriminationLlama Cross-generator 3.3-70B
Persona Separability (Δ)0.427
16
Transferable Adversarial AttackLlama 11B-V 3.2
Attack Success Rate (ASR)57.3
16
LLM InferenceLLaMA-7B v1 (serving)
Decode Latency (ms/token)12.16
16
Showing 25 of 229 rows
...