Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mistral

Benchmarks

Task NameDataset NameSOTA ResultTrend
Adversarial AttackMistral 7B
ASR100
45
Large Language Model WatermarkingMistral-7B-Instruct (test)
Perplexity (PPL)1.37
34
Language ModelingMistral-7B
Perplexity (Mistral-7B)5.45
24
Jailbreak AttackMistral-7B
NR40
20
Hallucination TracingMistral
Recall@k78.95
15
Adversarial Jailbreak AttackMistral 7B
Attack Success Rate (ASR)100
13
LLM fingerprintingMistral-7B
AUC1
10
Model Stealing AttacksMistral
BERT Score0.985
9
SteganographyMistral v0.3
Entropy (bit/token)0.9827
9
Watermarking DetectionMistral-7B
AUC1
7
Watermark DetectionMistral
Detection Rate98.1
7
Post-training Safety and Utility AlignmentMistral-7B
Unsafe Rate (%)0
7
LLM JailbreakingMistral-RB
SRF58
6
Jailbreak DefenseMistral-7B-Instruct
GCG Attack Count4
6
LLM JailbreakingMistral CB
Success Rate First (SRF)72
4
Object PlacementMistral (unseen)
Object Count68.32
4
Peak VRAM measurementMistral-Sm-24B
Peak VRAM (RTX)70.6
4
Adversarial Attack Diversity AnalysisMistral-7B
Average Attack Similarity0.336
3
Red-teamingMistral-7B
Attack Success Rate (ASR)56.7
3
Safety and Utility EvaluationMistral-7B 8-bit quantized
Unsafe Rate0
3
KV Cache QuantizationMistral
ΔPPL0.0012
3
Energy consumption rankingMistral workload 7B
Pairwise Accuracy99.3
2
Model Lineage AttestationMistral family
TPR0.99
1
Showing 23 of 23 rows