Share your thoughts, 1 month free Claude Pro on usSee more

Code Generation on BigCodeBench Instruct Full (train)

83.3Last SR

APEX-EM + Opus Judge (A5, E10)

Updated 3mo ago

Evaluation Results

Method	Links
APEX-EM + Opus Judge (A5, E10) 2026.03		83.3	84
APEX-EM (EG2, E10) 2026.03		81.1	82.7
APEX-EM (E10) 2026.03		81.1	81.5
MemRL 2026.03		59.5	62.7
MemP 2026.03		57.8	60.2
No Memory (A0) 2026.03		53.9	-
Mem0 2026.03		53	57.7
Self-RAG 2026.03		50	55.8
No Memory 2026.03		48.5	-
RAG 2026.03		47.9	53
Pass@10 2026.03		-	57.7
Reflexion 2026.03		-	58.2