Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chat Memory Reasoning on Chat Memory
Loading...
67
Accuracy
Qwen3-235B (w/ Pensieve)
-2.68
15.41
33.5
51.59
Feb 12, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-235B (w/ Pensieve)
Context=256K
2026.02
67
StateLM-14B-RL
Context=32K
2026.02
64.47
StateLM-14B
Context=32K
2026.02
64.4
StateLM-8B-RL
Context=32K
2026.02
59.73
StateLM-4B
Context=32K
2026.02
59.33
RL-MemoryAgent-14B
Context=32K
2026.02
59
StateLM-8B
Context=32K
2026.02
58.93
Qwen3-14B
Context=128K
2026.02
54.07
Qwen3-8B
Context=128K
2026.02
45.4
RL-MemoryAgent-7B
Context=32K
2026.02
40.6
Qwen3-4B
Context=128K
2026.02
39.53
ReadAgent-14B
Context=32K
2026.02
14.6
ReadAgent-8B
Context=32K
2026.02
0
Feedback
Search any
task
Search any
task