Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gemma

Benchmarks

Task NameDataset NameSOTA ResultTrend
Circuit Discovery EvaluationGemma-2-2B
Clarity82
70
Automated Interpretability EvaluationGemma-2-2B
Clarity80
50
Watermarking Attack RobustnessGemma 9B v2 (test)
TPR100
49
Negative Sentiment Backdoor DetectionGemma 2 9B
Attack Success Rate (ASR)0
48
Refusal Backdoor DetectionGemma-2-9B
ASR0
42
Model SteeringGemma 2 2B Steering Evaluation Set
Granularity1.2961
20
Sparse Autoencoder EvaluationGemma-2-2B activations
L0 Count320
20
Jailbreak AttackGemma 4B 3
NR66
20
Jailbreak attackGemma-7b five finetuned variants
Average ASR66.2
16
Jailbreak Attackgemma-7b v1 (pretrained)
ASR6
13
LLM AlignmentGemma-3-4B
Win Rate94.33
12
LLM fingerprintingGemma 2 2B
AUC1
10
Language ModelingGemma 3
Accuracy47.06
10
Semantic Attribute AlignmentGemma animal-attribute prompts
Happy Score26.11
9
Jailbreak AttackGemma-3 27B-it
ASR92
9
Model UtilityGemma-2B-IT
Utility57.8
8
Contextual Question AnsweringGemma-2B-IT 5% forget set
ROUGE-L92.4
8
Direct Question AnsweringGemma-2B-IT 5% forget set
ROUGE-L47.1
8
Adversarial AttackGemma 27B-it 3
Attack Success Rate (ASR)10
8
Transferable Adversarial AttackGemma 27B-it 3
ASR (%)30.2
8
Neuron DescriptionGemma 2
Faithfulness47
8
Output-based feature description evaluationGemma-2 MLP SAE features
Score49.9
8
Output-based feature description evaluationGemma-2 Residual SAE features
Score66.9
8
Watermark Detection RobustnessGemma-2 2B Pre-trained (PT) (test)
TPR (None)100
7
Watermarked text generation and detectionGemma-2 9B Pre-trained
TPR100
7
Showing 25 of 49 rows