Share your thoughts, 1 month free Claude Pro on usSee more

Safety Dialogue Evaluation on SafeDialBench (Normalized Score, Discriminability)

61.33Normalized Score

Qwen3-Max-Thinking

Updated 4mo ago

Evaluation Results

Method	Links
Qwen3-Max-Thinking 2026.03		61.33	18
DeepSeek-V3.2 2026.03		53.95	18
Kimi-K2.5 2026.03		49.86	18
MiniMax-M2.5 2026.03		46.85	18
GLM-5 2026.03		34.02	18