Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on CodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (1st Pass)
Loading...
66.7
Average Accuracy
MEMPROBE
34.148
42.599
51.05
59.501
Jun 1, 2026
Average Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Average Accuracy
MEMPROBE
Stream Design=Composit...
2026.06
66.7
ExpRAG
Stream Design=Composit...
2026.06
62.5
Self-invoking
Stream Design=Composit...
2026.06
60.4
Self-invoking (w/ subtask solution)
Stream Design=Composit...
2026.06
59
ReMem
Stream Design=Composit...
2026.06
58.3
DC-RS
Stream Design=Composit...
2026.06
48.6
ReAct
Stream Design=Composit...
2026.06
44.8
ReAct
Stream Design=Naive
2026.06
44.8
AWM
Stream Design=Naive
2026.06
44.5
LangMem
Stream Design=Composit...
2026.06
44.4
Mem0
Stream Design=Composit...
2026.06
43.8
Mem0
Stream Design=Naive
2026.06
43.8
MEMPROBE
Stream Design=Naive
2026.06
43.1
ReMem
Stream Design=Naive
2026.06
41.7
AWM
Stream Design=Composit...
2026.06
39.6
LangMem
Stream Design=Naive
2026.06
39.6
ExpRAG
Stream Design=Naive
2026.06
39.6
DC-RS
Stream Design=Naive
2026.06
35.4
Feedback
Search any
task
Search any
task