Failure is Feedback: History-Aware Backtracking for Agentic Traversal in Multimodal Graphs
About
Open-domain multimodal document retrieval aims to retrieve specific components (paragraphs, tables, or images) from large and interconnected document corpora. Existing graph-based retrieval approaches typically rely on a uniform similarity metric that overlooks hop-specific semantics, and their rigid pre-defined plans hinder dynamic error correction. These limitations suggest that a retriever should adapt its reasoning to the evolving context and recover intelligently from dead ends. To address these needs, we propose Failure is Feedback (FiF), which casts subgraph retrieval as a sequential decision process and introduces two key innovations. (i) We introduce a history-aware backtracking mechanism; unlike standard backtracking that simply reverts the state, our approach piggybacks on the context of failed traversals, leveraging insights from previous failures. (ii) We implement an economically-rational agentic workflow. Unlike conventional agents with static strategies, our orchestrator employs a cost-aware traversal method to dynamically manage the trade-off between retrieval accuracy and inference costs, escalating to intensive LLM-based reasoning only when the prior failure justifies the additional computational investment. Extensive experiments show that FiF achieves state-of-the-art retrieval on the benchmarks of MultimodalQA, MMCoQA and WebQA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Question Answering | MULTIMODALQADoc | EM65.15 | 12 | |
| Open-domain multimodal component retrieval | MULTIMODALQADoc | R@143.91 | 12 | |
| Multimodal Retrieval | MULTIMODALQA Doc (test) | Total Time (ms)1.15e+5 | 10 | |
| Multimodal Question Answering | MMCOQADoc | EM51.11 | 6 | |
| Open-domain multimodal component retrieval | MMCOQADoc | R@146.31 | 6 | |
| Multimodal Retrieval | MMCOQA Doc (test) | Total Time (ms)1.05e+5 | 5 |