Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Detection on BeaverTails (held-out)
Loading...
76.7
AUROC
MultiLayer-Linear
73.164
74.082
75
75.918
May 18, 2026
AUROC
Updated 13d ago
Evaluation Results
Method
Method
Links
AUROC
MultiLayer-Linear
Protocol=Leave-one-ben...
2026.05
76.7
Geometry-Lite
Protocol=Leave-one-ben...
2026.05
76.3
Best single-layer probe
Protocol=Leave-one-ben...
2026.05
75.4
TaT-Disp-LSTM
Protocol=Leave-one-ben...
2026.05
74.2
MultiLayer-DIM
Protocol=Leave-one-ben...
2026.05
73.3
Feedback
Search any
task
Search any
task