| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Adversarial Attack | Mistral 7B | ASR100 | 45 | |
| Large Language Model Watermarking | Mistral-7B-Instruct (test) | Perplexity (PPL)1.37 | 34 | |
| Language Modeling | Mistral-7B | Perplexity (Mistral-7B)5.45 | 24 | |
| Jailbreak Attack | Mistral-7B | NR40 | 20 | |
| Hallucination Tracing | Mistral | Recall@k78.95 | 15 | |
| Adversarial Jailbreak Attack | Mistral 7B | Attack Success Rate (ASR)100 | 13 | |
| LLM fingerprinting | Mistral-7B | AUC1 | 10 | |
| Model Stealing Attacks | Mistral | BERT Score0.985 | 9 | |
| Steganography | Mistral v0.3 | Entropy (bit/token)0.9827 | 9 | |
| Watermarking Detection | Mistral-7B | AUC1 | 7 | |
| Watermark Detection | Mistral | Detection Rate98.1 | 7 | |
| Post-training Safety and Utility Alignment | Mistral-7B | Unsafe Rate (%)0 | 7 | |
| LLM Jailbreaking | Mistral-RB | SRF58 | 6 | |
| Jailbreak Defense | Mistral-7B-Instruct | GCG Attack Count4 | 6 | |
| LLM Jailbreaking | Mistral CB | Success Rate First (SRF)72 | 4 | |
| Object Placement | Mistral (unseen) | Object Count68.32 | 4 | |
| Peak VRAM measurement | Mistral-Sm-24B | Peak VRAM (RTX)70.6 | 4 | |
| Adversarial Attack Diversity Analysis | Mistral-7B | Average Attack Similarity0.336 | 3 | |
| Red-teaming | Mistral-7B | Attack Success Rate (ASR)56.7 | 3 | |
| Safety and Utility Evaluation | Mistral-7B 8-bit quantized | Unsafe Rate0 | 3 | |
| KV Cache Quantization | Mistral | ΔPPL0.0012 | 3 | |
| Energy consumption ranking | Mistral workload 7B | Pairwise Accuracy99.3 | 2 | |
| Model Lineage Attestation | Mistral family | TPR0.99 | 1 |