Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmlessness on GPT-4 Evaluation Template T2 (overall)

89.99Win Rate

SafeDPO

31.33446.56261.7977.018May 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
89.997.72.31
2025.05
84.856.88.34
2025.05
69.4722.128.41
2025.05
57.6119.2523.15
2025.05
33.5924.5841.83