Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Dialogue Evaluation on SafeDialBench (Normalized Score, Discriminability)
Loading...
61.33
Normalized Score
Qwen3-Max-Thinking
32.9276
40.3013
47.675
55.0487
Mar 24, 2026
Normalized Score
Discriminability
Updated 24d ago
Evaluation Results
Method
Method
Links
Normalized Score
Discriminability
Qwen3-Max-Thinking
formatting=multi-turn,...
2026.03
61.33
18
DeepSeek-V3.2
formatting=multi-turn,...
2026.03
53.95
18
Kimi-K2.5
formatting=multi-turn,...
2026.03
49.86
18
MiniMax-M2.5
formatting=multi-turn,...
2026.03
46.85
18
GLM-5
formatting=multi-turn,...
2026.03
34.02
18
Feedback
Search any
task
Search any
task