Share your thoughts, 1 month free Claude Pro on usSee more

Multi-domain knowledge reasoning on HLE 500-question ablation

57.3Success Rate (Last)

MemRL

Updated 3mo ago

Evaluation Results

Method	Links
MemRL 2026.03		57.3	61.3
MemP 2026.03		52.8	58.2
Mem0 2026.03		51.2	56
RAG 2026.03		50	54.8
Self-RAG 2026.03		48.8	54.8
APEX-EM: Entity graph (A3, E10) 2026.03		48	53.3
APEX-EM: Full memory (A2, E10) 2026.03		46.8	52.3
APEX-EM: Semantic only (A1, E10) 2026.03		45.6	52.3
APEX-EM: Judge + iteration (A5, E10) 2026.03		40.4	52.9
No Memory 2026.03		35.7	-
APEX-EM: No Memory (A0) 2026.03		25.2	-
APEX-EM: Memory, no judge (A4, E10) 2026.03		19.4	47.9
Pass@10 2026.03		-	52.4
Reflexion 2026.03		-	53