| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Large Language Model Watermarking | Mistral-7B-Instruct (test) | Perplexity (PPL)1.37 | 34 | |
| Language Modeling | Mistral-7B | Perplexity (Mistral-7B)5.45 | 24 | |
| Jailbreak Attack | Mistral-7B | NR40 | 20 | |
| Hallucination Tracing | Mistral | Recall@k78.95 | 15 | |
| LLM fingerprinting | Mistral-7B | AUC1 | 10 | |
| Jailbreak Defense | Mistral-7B-Instruct | GCG Attack Count4 | 6 | |
| Peak VRAM measurement | Mistral-Sm-24B | Peak VRAM (RTX)70.6 | 4 | |
| KV Cache Quantization | Mistral | ΔPPL0.0012 | 3 | |
| Model Lineage Attestation | Mistral family | TPR0.99 | 1 |