Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Performance on JBB
Loading...
-9.1
Selective Refusal Score (Δs)
No Steering
-12.512
10.519
33.55
56.581
Dec 7, 2025
Selective Refusal Score (Δs)
Updated 4d ago
Evaluation Results
Method
Method
Links
Selective Refusal Score (Δs)
No Steering
Base Model=Llama 3 8B,...
2025.12
-9.1
Prompt guardrails
Base Model=Llama 3 8B,...
2025.12
10.2
CAA
Base Model=Llama 3 8B,...
2025.12
30.1
SAE steering
Base Model=Llama 3 8B,...
2025.12
36
GSAE-1D
Base Model=Llama 3 8B,...
2025.12
40.1
Random graphs
Base Model=Llama 3 8B,...
2025.12
44.2
SafeSwitch
Base Model=Llama 3 8B,...
2025.12
51.4
No gating
Base Model=Llama 3 8B,...
2025.12
64.1
Gradient Cuff
Base Model=Llama 3 8B,...
2025.12
68
Input Gate Only
Base Model=Llama 3 8B,...
2025.12
70.3
GSAE
Base Model=Llama 3 8B,...
2025.12
76.2
Feedback
Search any
task
Search any
task