Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Dialogue Evaluation on SafeDialBench (Normalized Score, Discriminability)

61.33Normalized Score

Qwen3-Max-Thinking

32.927640.301347.67555.0487Mar 24, 2026
Updated 24d ago

Evaluation Results

MethodLinks
2026.03
61.3318
2026.03
53.9518
2026.03
49.8618
2026.03
46.8518
2026.03
34.0218