| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Model Retrieval | Mistral-7B model tree (test) | Rank1 | 21 | |
| Targeted Refusal | Mistral-7B Generation Evaluation Set | CA97.01 | 15 | |
| Sentiment Steering | Mistral-7B Generation (Evaluation Set) | Control Accuracy (CA)96.38 | 15 | |
| Model Fingerprinting Detection | Mistral-7B black-box setting v0.3 | True Positive Rate (TPR)98.4 | 10 | |
| Language Modeling | Mistral-7B Long-context (8k window) | Perplexity4.568 | 8 | |
| Language Modeling | Mistral-7B Long-context (4k window) | Perplexity5.241 | 8 | |
| Jailbreak Defense | Mistral-7B Jailbreak Evaluation | GCG Attack Success Rate0 | 6 | |
| Adversarial Attack | Mistral-7B (successful attacks) | Unique Queries3,021 | 3 | |
| Text Generation | Mistral-7B v0.3 (test) | S-BLEU34.2 | 3 | |
| Weight Quantization Fidelity | Mistral-7B | MSE5.36 | 2 |