| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | Gemini Flash 2.5 (test) | ASR0 | 27 | |
| Persona Discovery | Gemini Flash (Small Target) 2.5 | Similarity Score98 | 18 | |
| Targeted Attack | Gemini 1.5-pro 2.5-flash (test) | ASR67.4 | 16 | |
| Adversarial Attack | Gemini 2.0 | ASR41.3 | 11 | |
| AI-Generated Text Detection | Gemini-2.0 Flash generated text | AUROC (Insertion)99.34 | 10 | |
| Black-box Adversarial Attack | Gemini 2.5-Pro | KMRa0.87 | 9 | |
| Jailbreaking | Gemini Pro 3 | ASR92.5 | 9 | |
| Targeted Attack | Gemini-3-flash closed-source standard MLLMs | Attack Success Rate (ASR)4 | 8 | |
| Targeted Attack | Gemini-1.5 3-flash (test) | ASR50.8 | 8 | |
| Targeted Adversarial Attack | Gemini 3.1 | Attack Success Rate (ASR)70.2 | 8 | |
| Targeted Adversarial Attack | Gemini 2.5 | ASR81.3 | 8 | |
| Image Captioning | Gemini Image Captioning Hard Criterion 1.5 | ASR81 | 8 | |
| Multi-shot video generation | Gemini 100 multi-shot video prompts 2.5 Pro | Intra-shot Consistency (Subject)0.825 | 8 | |
| AI-Generated Text Detection | Gemini-3 generated text | AUROC92.84 | 7 | |
| Safety Auditing | Gemini flash 1.5 | Detoxify Score81.33 | 5 | |
| Policy Corruption Evaluation | Gemini-2-Flash | Compliance3.65 | 5 | |
| Adversarial Attack | Gemini-3-flash | ASR51 | 4 | |
| Keyword Matching Attack | Gemini flash 1.5 | KMR (alpha)83 | 4 | |
| Jailbreak Attack | Gemini Flash 3 | Attack Success Rate90.5 | 4 | |
| Targeted Adversarial Attack | Gemini Flash 1.5 | Attack Success Rate (T1)58 | 4 | |
| Targeted Adversarial Attack | Gemini 2.0 | ASR520 | 4 |