Share your thoughts, 1 month free Claude Pro on usSee more

Gemma

Benchmarks

Task Name	Dataset Name	SOTA Result
Circuit Discovery Evaluation	Gemma-2-2B	Clarity82	70
Automated Interpretability Evaluation	Gemma-2-2B	Clarity80	50
Watermarking Attack Robustness	Gemma 9B v2 (test)	TPR100	49
Negative Sentiment Backdoor Detection	Gemma 2 9B	Attack Success Rate (ASR)0	48
Refusal Backdoor Detection	Gemma-2-9B	ASR0	42
Decoder Reconstruction	GEMMA-3-4B-IT	ROUGE-1 Score99.1	40
Decoder Reconstruction	Gemma 4B IT 3	ROUGE-198.6	24
Model Steering	Gemma 2 2B Steering Evaluation Set	Granularity1.2961	20
Sparse Autoencoder Evaluation	Gemma-2-2B activations	L0 Count320	20
Jailbreak Attack	Gemma 4B 3	NR66	20
Jailbreak attack	Gemma-7b five finetuned variants	Average ASR66.2	16
Content Injection	Gemma-2B	Keyword Occurrence (%)92.9	15
Jailbreak Attack	gemma-7b v1 (pretrained)	ASR6	13
Over-refusal Mitigation	Gemma-2B Over-refusal	Informative Refusal Rate1.6	12
LLM Alignment	Gemma-3-4B	Win Rate94.33	12
Text Quality Assessment	Gemma-7B Unmodified watermarked text quality	Perplexity (PPL)12.188	11
Text Watermark Detection	Gemma-7B Post-translation to French human-authored texts	TPR @ 1% FPR59.2	11
Text Watermark Detection	Gemma-7B Post-translation to German human-authored texts	TPR @ 1% FPR53.4	11
Text Watermark Detection	Gemma-7B Post-paraphrasing human-authored texts	TPR @ 1% FPR84	11
Text Watermark Detection	Gemma-7B Unmodified human-authored texts	TPR @ 1% FPR100	11
LLM fingerprinting	Gemma 2 2B	AUC1	10
Language Modeling	Gemma 3	Accuracy47.06	10
SAE Feature Attribution	Gemma-3-4B 16K SAE	Input Success Rate14.7	9
SAE Feature Attribution	Gemma-3-1B 16K SAE	Input Attribution14.71	9
SAE Feature Attribution	Gemma-3-270M 16K SAE	Input Success Rate (%)17.3	9

Showing 25 of 76 rows