Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on CodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (2nd Pass)
Loading...
64.6
Average Accuracy
MEMPROBE
37.144
44.272
51.4
58.528
Jun 1, 2026
Average Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Average Accuracy
MEMPROBE
Stream Design=Composit...
2026.06
64.6
ExpRAG
Stream Design=Composit...
2026.06
62.5
Self-invoking
Stream Design=Composit...
2026.06
60.4
Self-invoking (w/ subtask solution)
Stream Design=Composit...
2026.06
59
ReMem
Stream Design=Composit...
2026.06
56.3
DC-RS
Stream Design=Composit...
2026.06
50.7
LangMem
Stream Design=Composit...
2026.06
45.8
ReAct
Stream Design=Composit...
2026.06
44.8
ReAct
Stream Design=Naive
2026.06
44.8
MEMPROBE
Stream Design=Naive
2026.06
44.5
Mem0
Stream Design=Composit...
2026.06
44.4
ReMem
Stream Design=Naive
2026.06
43.8
LangMem
Stream Design=Naive
2026.06
42.4
Mem0
Stream Design=Naive
2026.06
42.4
AWM
Stream Design=Naive
2026.06
41
DC-RS
Stream Design=Naive
2026.06
39.6
ExpRAG
Stream Design=Naive
2026.06
39.6
AWM
Stream Design=Composit...
2026.06
38.2
Feedback
Search any
task
Search any
task