Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Detection on ToxicChat (held-out)
Loading...
87.7
AUROC
MultiLayer-DIM
74.596
77.998
81.4
84.802
May 18, 2026
AUROC
Updated 13d ago
Evaluation Results
Method
Method
Links
AUROC
MultiLayer-DIM
Protocol=Leave-one-ben...
2026.05
87.7
TaT-Disp-LSTM
Protocol=Leave-one-ben...
2026.05
81.8
MultiLayer-Linear
Protocol=Leave-one-ben...
2026.05
79.9
Geometry-Lite
Protocol=Leave-one-ben...
2026.05
79.8
Best single-layer probe
Protocol=Leave-one-ben...
2026.05
75.1
Feedback
Search any
task
Search any
task