Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety on AttaQ
Loading...
0.81
Average Score
REINFORCE++ (Ours)
0.3524
0.4712
0.59
0.7088
Dec 1, 2025
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
REINFORCE++ (Ours)
Base Model=Qwen3-8B, T...
2025.12
0.81
Qwen3-8B + CPO
Base Model=Qwen3-8B, T...
2025.12
0.79
REINFORCE++ (Ours)
Base Model=DeepSeek-R1...
2025.12
0.78
Qwen3-8B + SFT (STAR-1)
Base Model=Qwen3-8B, T...
2025.12
0.78
DeepSeek-R1-Distill-Qwen-7B + SFT (STAR-1)
Base Model=DeepSeek-R1...
2025.12
0.76
Qwen3-8B + SFT (R2D-R1)
Base Model=Qwen3-8B, T...
2025.12
0.75
Qwen3-8B (thinking)
Base Model=Qwen3-8B, T...
2025.12
0.73
DeepSeek-R1-Distill-Qwen-7B + CPO
Base Model=DeepSeek-R1...
2025.12
0.59
DeepSeek-R1-Distill-Qwen-7B + SFT (R2D-R1)
Base Model=DeepSeek-R1...
2025.12
0.56
Qwen3-8B + SFT (SafeChain)
Base Model=Qwen3-8B, T...
2025.12
0.49
DeepSeek-R1-Distill-Qwen-7B
Base Model=DeepSeek-R1...
2025.12
0.37
DeepSeek-R1-Distill-Qwen-7B + SFT (SafeChain)
Base Model=DeepSeek-R1...
2025.12
0.37
Feedback
Search any
task
Search any
task