Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GCG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseGCG
ASR0
91
Jailbreak DetectionGCG
Accuracy99
30
Jailbreak AttackGCG
ASR96
27
Jailbreak Attack DefenseGCG
ASR0
24
Harmfulness EvaluationGCG
Harmfulness Score1.16
22
Adversarial RobustnessGCG
GCG Rate0.13
21
Adversarial DetectionGCG
DSR100
18
Adversarial Attack DefenseGCG Individual
BAR100
18
Jailbreak Attack RobustnessGCG
Harmfulness Rate0
17
Abnormal Behavior DetectionGCG (test)
Accuracy100
17
Safety EvaluationGCG
Safety Score95.96
16
Jailbreak DetectionGCG
ASR13
15
Jailbreak DefenseGCG
ASR0
13
Interleaved text-mask generationGCG (test)
METEOR17.4
10
Interleaved text-mask generationGCG (val)
METEOR17.7
10
Text-only Jailbreak Attack DefenseGCG attack (test)
ASR8.24
9
Jailbreak MitigationGCG
GCG ASR0
8
Jailbreak DetectionGCG
Detection Rate99
4
Jailbreak DefenseGCG
LlamaGuard Score100
4
Prompt InjectionGCG Clean
ASR37.02
4
Jailbreak ResistanceGCG
Refusal Rate96.98
3
Grounded Conversation GenerationGCG (test)
mIoU62.34
3
Jailbreak DetectionGCG
Transferred Detection Rate87
2
Showing 23 of 23 rows