Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Query Transformation on Safe-RLHF (test)

36Effectiveness

HARMTRANSFORM

17.2822.142731.86Dec 9, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
3673
2025.12
2477
2025.12
2273
2025.12
1837