Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Llama2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Attention Operator ThroughputLlama2 7B (32 Q-heads/32 KV-heads/128 Head-dimension)
Attention TFLOPS207.3
30
Jailbreak attackLlama2-7b five finetuned variants
Average ASR0
16
AccuracyLLaMA2-7B zero-shot
Zero-Shot Accuracy67.18
16
Targeted RefusalLlama2-7B Generation Evaluation Set
Completion Accuracy (CA)93.14
15
Sentiment SteeringLlama2-7B Generation Evaluation Set
Accuracy (CA)90.15
15
Multi-bit WatermarkingLLaMA2-7B 300 tokens (test)
Perplexity7.0486
14
Inference EfficiencyLLaMA2-7B 12/128 tokens
Latency1.889
13
Jailbreak Attackllama2-7b v1 (pretrained)
ASR0
13
Watermark DetectionLlama2-7B Copy-paste attack
F1 Score97.8
11
LLM fingerprintingLlama2 7B
AUC100
10
Watermark DetectionLlama2-7B Paragraphing
F1 Score91.6
8
Watermark DetectionLlama2-7B Synonymous substitution
F1 Score98.5
8
Watermark DetectionLlama2-7B Clean
F1 Score100
8
LLM InferenceLLaMA2 7B
Latency (ms)1,052.24
4
LLM TrainingLlama2-70B (64 x H100-8)
Iteration Time (s)7.8
4
LLM TrainingLlama2 7B
Iteration Time (s)1.4
4
QuantizationLLaMA2-7B
Averaged Quantization Time (s)24
4
Text Perplexity EvaluationLlama2 7B
PPL (Trial 1)7.626
3
Watermark Detection (Paraphrasing Attack)Llama2-7B
F1 Score91.8
3
Inference LatencyLLaMA2 70B
Latency (ms)1,450
3
Knowledge EditingLLaMA2-13B Sequential batch-editing setup
S Score84.7
3
LLM TrainingLlama2-7B tpu-v5p-512
Iteration Time (s)2.5
3
LLM TrainingLlama2 70B (tpu-v5p-1024)
Iteration Time (s)11.6
2
LLM TrainingLlama2 70B 64 x Trainium2-16
Iteration Time (s)11.2
1
LLM TrainingLlama2 7B 64 x Trainium2-16
Iteration Time (s)1.2
1
Showing 25 of 26 rows