Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gemini

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackGemini Flash 2.5 (test)
ASR0
27
Persona DiscoveryGemini Flash (Small Target) 2.5
Similarity Score98
18
Targeted AttackGemini 1.5-pro 2.5-flash (test)
ASR67.4
16
Adversarial AttackGemini 2.0
ASR41.3
11
AI-Generated Text DetectionGemini-2.0 Flash generated text
AUROC (Insertion)99.34
10
Black-box Adversarial AttackGemini 2.5-Pro
KMRa0.87
9
JailbreakingGemini Pro 3
ASR92.5
9
Targeted AttackGemini-3-flash closed-source standard MLLMs
Attack Success Rate (ASR)4
8
Targeted AttackGemini-1.5 3-flash (test)
ASR50.8
8
Targeted Adversarial AttackGemini 3.1
Attack Success Rate (ASR)70.2
8
Targeted Adversarial AttackGemini 2.5
ASR81.3
8
Image CaptioningGemini Image Captioning Hard Criterion 1.5
ASR81
8
Multi-shot video generationGemini 100 multi-shot video prompts 2.5 Pro
Intra-shot Consistency (Subject)0.825
8
AI-Generated Text DetectionGemini-3 generated text
AUROC92.84
7
Safety AuditingGemini flash 1.5
Detoxify Score81.33
5
Policy Corruption EvaluationGemini-2-Flash
Compliance3.65
5
Adversarial AttackGemini-3-flash
ASR51
4
Keyword Matching AttackGemini flash 1.5
KMR (alpha)83
4
Jailbreak AttackGemini Flash 3
Attack Success Rate90.5
4
Targeted Adversarial AttackGemini Flash 1.5
Attack Success Rate (T1)58
4
Targeted Adversarial AttackGemini 2.0
ASR520
4
Showing 21 of 21 rows