Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Out-of-distribution (OOD) Harmful Content Detection on HarmBench
Loading...
0.961
AUROC (vs Alpaca)
w_opt
0.91295
0.936975
0.961
0.985025
Apr 20, 2026
AUROC (vs Alpaca)
AUROC (vs XSTest)
Minimum AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC (vs Alpaca)
AUROC (vs XSTest)
Minimum AUROC
w_opt
strategy=Optimised dis...
2026.04
0.961
0.979
0.961
Feedback
Search any
task
Search any
task