Share your thoughts, 1 month free Claude Pro on usSee more

Llama

Benchmarks

Task Name	Dataset Name	SOTA Result
Attention Operator Throughput	Llama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1	TFLOPS615.39	62
End-to-end throughput	LLaMA-2-7B-Chat	Throughput (tokens/sec)449	60
Latency Measurement	LLaMA-8B-Instruct Chunked Prefill 3.1 (inference)	Attention Latency (ms)423.1	49
LLM Decoding	Llama 70B 3.1	Throughput3,119.55	48
Negative Sentiment	Llama-3-8B n≈200	ASR77	42
Linear Layer Latency Inference	Llama-3-8B decoder block	Latency (µs)153	36
Large Language Model Inference	LLaMA-2 7B (inference)	P99 Per-Token Latency (ms)8.23	33
Quantization	LLAMA	Processing Time (hr)0.25	30
Time to First Token	Llama 3.1 8B Q4 weights	TTFT (ms)204	28
LLM Decoding	Llama 70B (H100 GPU Cluster) 3.1	Throughput894.32	27
Fingerprint Similarity	Llama2 7B	Similarity Score1	24
Model Retrieval	Llama-8B model tree (test)	Rank1	21
Decoding	Llama 70B 3.1 (inference)	Throughput1,410.39	21
Jailbreak Attack	Llama 8B 3.1	NR Rate96	20
Throughput Measurement	LLaMA-2 13B	Throughput (tokens/s)19.4	20
Language Modeling	LLaMA 13B 2	Perplexity (PPL)4.57	20
Persona Discovery	Llama-3.1-70B Large Target	Similarity Score98	18
Persona Discovery	Llama 8B Small Target 3.1	Similarity Score0.97	18
Representation Injection Performance	Llama2-7B evaluation scenarios (test)	Accuracy85.16	18
Language Modeling	LLaMA-2-7B	Perplexity5.47	18
Jailbreak Attack Transferability	Llama-3B Target Transferability set	ASR81	17
Jailbreak Attack	Llama 7B 2	ASR97	17
Persona Discrimination	Llama Cross-generator 3.3-70B	Persona Separability (Δ)0.427	16
Transferable Adversarial Attack	Llama 11B-V 3.2	Attack Success Rate (ASR)57.3	16
LLM Inference	LLaMA-7B v1 (serving)	Decode Latency (ms/token)12.16	16

Showing 25 of 229 rows

...