Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Refusal Editing on HarmBench (Gemma-3-4B-IT held-out split)
Loading...
88.6
Edit Refusal Rate
Base model (no intervention)
-3.336
20.532
44.4
68.268
May 18, 2026
Edit Refusal Rate
Benign Preservation
Harmful Preservation
Harmful Refusal Rate
Harmful Drift
Updated 13d ago
Evaluation Results
Method
Method
Links
Edit Refusal Rate
Benign Preservation
Harmful Preservation
Harmful Refusal Rate
Harmful Drift
Base model (no intervention)
Type=reference
2026.05
88.6
100
100
81.6
0
Residual Paving thresholded-soft S4/T2
Type=learned route
2026.05
4
95.5
87.3
65.3
-16.3
Residual Paving oracle S4/T2
Type=diagnostic
2026.05
0.2
100
100
81.6
0
Feedback
Search any
task
Search any
task