| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Circuit Discovery Evaluation | Gemma-2-2B | Clarity82 | 70 | |
| Automated Interpretability Evaluation | Gemma-2-2B | Clarity80 | 50 | |
| Jailbreak Attack | Gemma 4B 3 | NR66 | 20 | |
| Jailbreak attack | Gemma-7b five finetuned variants | Average ASR66.2 | 16 | |
| Jailbreak Attack | gemma-7b v1 (pretrained) | ASR6 | 13 | |
| LLM Alignment | Gemma-3-4B | Win Rate94.33 | 12 | |
| LLM fingerprinting | Gemma 2 2B | AUC1 | 10 | |
| Language Modeling | Gemma 3 | Accuracy47.06 | 10 | |
| Jailbreak Attack | Gemma-3 27B-it | ASR92 | 9 | |
| Neuron Description | Gemma 2 | Faithfulness47 | 8 | |
| Output-based feature description evaluation | Gemma-2 MLP SAE features | Score49.9 | 8 | |
| Output-based feature description evaluation | Gemma-2 Residual SAE features | Score66.9 | 8 | |
| Debiasing | Gemma-3-4b-it (test) | Mean Log-Likelihood Difference5.07 | 6 | |
| Multi-path Speculative Decoding | Gemma (test) | Throughput (tokens/s)13.26 | 6 | |
| Chat Fine-tuning | Gemma 1B Chat | vNMSE0.0012 | 6 | |
| LLM Attack Effectiveness | Gemma3 12B-it | TTFT (s)0.13 | 6 | |
| Multi-path speculative decoding | Gemma held-out (test) | Throughput Ratio Improvement2.17 | 5 | |
| Adversarial Attack | Gemma 4B-it 3 | ASR25 | 5 | |
| Opaque Serial Depth Calculation | Gemma 3 | Final Depth Formula11,322 | 4 | |
| Explanation Evaluation | Gemma vision encoder later layer SAE 3 (test) | IoU (Masks)20.4 | 3 | |
| Model Lineage Attestation | Gemma family | TPR98 | 1 |