ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
About
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and its high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods could fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition on reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global context comprehension, offering a principled, cognitively motivated paradigm towards retrieval-based stateful reasoning. Our framework is made publicly available at https://github.com/EternityJune25/ComoRAG.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long narrative understanding QA | NoCha | -- | 32 | |
| Question Answering | GraphRAG-Benchmark MEDICAL | Fact Retrieval (FR)58.92 | 15 | |
| Long narrative understanding QA | NarrativeQA | Accuracy54 | 14 | |
| Generative sense-making QA | LongBench | Comprehensiveness0.6218 | 14 | |
| Long narrative understanding QA | Prelude | Accuracy54.07 | 14 | |
| Question Answering | 2WikiMultiHopQA 1,000 queries (test) | EM48.4 | 13 | |
| Question Answering | PopQA 1,000 queries (test) | EM45.8 | 10 | |
| Question Answering | NQ 1,000 queries (test) | EM38.5 | 10 | |
| Question Answering | MuSiQue 1,000 queries (test) | EM24.5 | 10 | |
| Question Answering | HotpotQA 1,000 queries (test) | EM39.9 | 10 |