Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Memory Extraction on BEHEMOTH in-distribution (test)
Loading...
65.72
Personalization (MA)
CluE
32.8872
41.4111
49.935
58.4589
Apr 13, 2026
Personalization (MA)
Personalization (RG)
Problem-Solving (MA)
Problem-Solving (RG)
Agentic (MA)
Agentic (RG)
Overall (MA)
Overall (RG)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Personalization (MA)
Personalization (RG)
Problem-Solving (MA)
Problem-Solving (RG)
Agentic (MA)
Agentic (RG)
Overall (MA)
Overall (RG)
CluE
Backbone=Qwen3-32B
2026.04
65.72
12.34
51.85
8.39
35.24
7.22
50.01
9.04
MemEvolve
2026.04
63.67
10.76
49.49
1.17
30.92
3.25
47.08
2.11
Simple
2026.04
58.76
0
48.76
0
32.46
0
46
0
GEPA
2026.04
56.59
2.16
50.05
2.86
37.23
14.08
47.52
5.06
ACE
2026.04
53.73
4.38
51.04
8.04
33.49
4.93
45.91
3.56
No Memory
2026.04
34.15
-
46.52
-
30.36
-
37.84
-
Feedback
Search any
task
Search any
task