Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Content Detection on BeaverTails Harmful (held-out target labels)

0.793AUROC

Quotient Transfer

0.625560.669030.71250.75597May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
0.793
2026.05
0.791
2026.05
0.781
2026.05
0.773
2026.05
0.772
0.632