Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Claude

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackClaude 3.5
ASR0
24
Jailbreak AttackClaude Sonnet API 3.5
ASR80.5
16
AI-Generated Text DetectionClaude Sonnet 3.7
AUROC (Insertion)0.9964
10
Black-box Adversarial AttackClaude thinking 4.0
KMR (a)0.02
9
JailbreakingClaude 4.5
ASR97
9
Targeted AttackClaude-3-Opus 4.6 (test)
ASR76.8
8
Targeted Adversarial AttackClaude 4.7
ASR76.5
8
Targeted Adversarial AttackClaude 4.6
Attack Success Rate (ASR)69.2
8
AI-generated text detectionClaude-generated (test)
F1 Score92.2
5
Keyword Matching AttackClaude-3-Opus
KMR (alpha)92
4
Adversarial Attack TransferClaude 3.5
Similarity Score (SS)64.5
3
Showing 11 of 11 rows