Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context dialogue evaluation on LoCoMo
Loading...
69.26
Normalized Score
GLM-5
41.856
48.9705
56.085
63.1995
Mar 24, 2026
Normalized Score
Discriminability
Updated 24d ago
Evaluation Results
Method
Method
Links
Normalized Score
Discriminability
GLM-5
formatting=multi-turn,...
2026.03
69.26
0.15
MiniMax-M2.5
formatting=multi-turn,...
2026.03
66.04
0.15
Qwen3-Max-Thinking
formatting=multi-turn,...
2026.03
62.11
0.15
DeepSeek-V3.2
formatting=multi-turn,...
2026.03
59.25
0.15
Kimi-K2.5
formatting=multi-turn,...
2026.03
42.91
0.15
Feedback
Search any
task
Search any
task