Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Refusal on WG (test)
Loading...
11.6
ASR
SELF-REDTEAM
10.936
15.418
19.9
24.382
Oct 9, 2025
Nov 13, 2025
Dec 18, 2025
Jan 22, 2026
Feb 26, 2026
Apr 2, 2026
May 8, 2026
ASR
Updated 22d ago
Evaluation Results
Method
Method
Links
ASR
SELF-REDTEAM
Backbone=Qwen2.5-14B-I...
2026.05
11.6
ABS
Backbone=Qwen2.5-3B-IT...
2026.05
12.2
ABS
Backbone=Qwen2.5-14B-I...
2026.05
13.1
Self-Play + SFT
Base Model=Llama-3.1-8...
2025.10
13.8
Qwen2.5-14B-IT
Backbone=Qwen2.5-14B-IT
2026.05
14.6
ABS
Backbone=Qwen2.5-7B-IT...
2026.05
16.9
Self-Play
Base Model=Llama-3.1-8...
2025.10
17.2
SELF-REDTEAM
Backbone=Qwen2.5-7B-IT...
2026.05
17.8
SFT
Base Model=Llama-3.1-8...
2025.10
18.3
ELS
Base Model=Llama-3.1-8...
2025.10
21.9
Llama-3.1-8B-IT
Model Status=Base Mode...
2025.10
22.3
SELF-REDTEAM
Backbone=Qwen2.5-3B-IT...
2026.05
23.4
Defender-Only + SFT
Base Model=Llama-3.1-8...
2025.10
25.1
Defender-Only
Base Model=Llama-3.1-8...
2025.10
27.6
Qwen2.5-7B-IT
Backbone=Qwen2.5-7B-IT
2026.05
27.9
Qwen2.5-3B-IT
Backbone=Qwen2.5-3B-IT
2026.05
28.2
Feedback
Search any
task
Search any
task