Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on Safety Evaluation Suite WJ, FH, SR
Loading...
4.6
ASR (WJ)
WaltzRL
3.224
12.512
21.8
31.088
Oct 9, 2025
ASR (WJ)
ASR (FH)
ASR (SR)
Average ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR (WJ)
ASR (FH)
ASR (SR)
Average ASR
WaltzRL
Method Identity=Method 7
2025.10
4.6
6.2
0.3
3.7
Single-model RL + Safeguard
Safeguard=Llama Guard 4
2025.10
7.3
8.4
0.3
5.3
Oracle label-converted feedback
Feedback Source=Ground...
2025.10
10.6
10.4
0
7
Single-model RL
Description=Traditiona...
2025.10
13.2
22.8
0.6
12.2
Baseline response + Safeguard
Backbone=Llama-3.1-8B-...
2025.10
16
11
0
9
Inference-time collaboration
Backbone=Llama-3.1-8B-...
2025.10
19.4
17
3.8
13.4
Baseline response
Backbone=Llama-3.1-8B-...
2025.10
39
40.4
0
26.5
Feedback
Search any
task
Search any
task