Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Dialogue on MathChat
Loading...
77.87
Normalized Score
DeepSeek-V3.2
60.7516
65.1958
69.64
74.0842
Mar 24, 2026
Normalized Score
Discriminability
Updated 24d ago
Evaluation Results
Method
Method
Links
Normalized Score
Discriminability
DeepSeek-V3.2
formatting=multi-turn,...
2026.03
77.87
9
Qwen3-Max-Thinking
formatting=multi-turn,...
2026.03
77.43
9
MiniMax-M2.5
formatting=multi-turn,...
2026.03
69.58
9
GLM-5
formatting=multi-turn,...
2026.03
67.65
9
Kimi-K2.5
formatting=multi-turn,...
2026.03
61.41
9
Feedback
Search any
task
Search any
task