Failure is Feedback: History-Aware Backtracking for Agentic Traversal in Multimodal Graphs

About

Open-domain multimodal document retrieval aims to retrieve specific components (paragraphs, tables, or images) from large and interconnected document corpora. Existing graph-based retrieval approaches typically rely on a uniform similarity metric that overlooks hop-specific semantics, and their rigid pre-defined plans hinder dynamic error correction. These limitations suggest that a retriever should adapt its reasoning to the evolving context and recover intelligently from dead ends. To address these needs, we propose Failure is Feedback (FiF), which casts subgraph retrieval as a sequential decision process and introduces two key innovations. (i) We introduce a history-aware backtracking mechanism; unlike standard backtracking that simply reverts the state, our approach piggybacks on the context of failed traversals, leveraging insights from previous failures. (ii) We implement an economically-rational agentic workflow. Unlike conventional agents with static strategies, our orchestrator employs a cost-aware traversal method to dynamically manage the trade-off between retrieval accuracy and inference costs, escalating to intensive LLM-based reasoning only when the prior failure justifies the additional computational investment. Extensive experiments show that FiF achieves state-of-the-art retrieval on the benchmarks of MultimodalQA, MMCoQA and WebQA.

Joohyung Yun, Doyup Lee, Wook-Shin Han• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Question Answering	MULTIMODALQADoc	EM65.15	12
Open-domain multimodal component retrieval	MULTIMODALQADoc	R@143.91	12
Multimodal Retrieval	MULTIMODALQA Doc (test)	Total Time (ms)1.15e+5	10
Multimodal Question Answering	MMCOQADoc	EM51.11	6
Open-domain multimodal component retrieval	MMCOQADoc	R@146.31	6
Multimodal Retrieval	MMCOQA Doc (test)	Total Time (ms)1.05e+5	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord