Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Query Transformation on Safe-RLHF (test)
Loading...
36
Effectiveness
HARMTRANSFORM
17.28
22.14
27
31.86
Dec 9, 2025
Effectiveness
Preservation
Updated 1mo ago
Evaluation Results
Method
Method
Links
Effectiveness
Preservation
HARMTRANSFORM
Debate Rounds=1, Numbe...
2025.12
36
73
SingleLLM
Framework=Single LLM,...
2025.12
24
77
HARMTRANSFORM-NoDebate
Debate Rounds=0
2025.12
22
73
SingleLLMReflect
Framework=Single LLM,...
2025.12
18
37
Feedback
Search any
task
Search any
task