Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Red-teaming on GFN-defended Target Model
Loading...
0.33
Unsuccessful Attack Rate (UA)
PPO
-1.39
10.22
21.83
33.44
May 1, 2026
Unsuccessful Attack Rate (UA)
Attack Success Rate (ASR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Unsuccessful Attack Rate (UA)
Attack Success Rate (ASR)
PPO
2026.05
0.33
0.03
SFT
2026.05
0.67
0.07
PPO + Curiosity
2026.05
1
0.1
DPO
2026.05
2
0.2
Rainbow Teaming
2026.05
3.33
0.33
GFN
2026.05
5
4.69
ICL
2026.05
5.33
0.52
Jailbreak R1
CoT=enabled
2026.05
30.33
2.96
S-GFN
2026.05
43.33
22.53
Feedback
Search any
task
Search any
task