Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmlessness Evaluation on HH-RLHF
Loading...
94.5
Harmlessness Rate
SFT + TTL + DPO + TPO
90.028
91.189
92.35
93.511
May 8, 2026
Harmlessness Rate
Updated 23d ago
Evaluation Results
Method
Method
Links
Harmlessness Rate
SFT + TTL + DPO + TPO
Initialization=SFT + T...
2026.05
94.5
DPO + Topo-TPO
2026.05
94.1
DPO + TPO
2026.05
93.5
SFT + TTL + DPO
Initialization=SFT + T...
2026.05
92.8
DPO
2026.05
90.2
SFT (no TTL) + DPO
Initialization=SFT (no...
2026.05
90.2
Feedback
Search any
task
Search any
task