Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval

About

Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers - each representing coarse and fine granularity - facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for efficient candidate generation, then performs fine-grained reasoning via late interaction. Extensive experiments demonstrate that LILaC achieves state-of-the-art retrieval performance on all five benchmarks, notably without additional fine-tuning. We make the artifacts publicly available at github.com/joohyung00/lilac.

Joohyung Yun, Doyup Lee, Wook-Shin Han• 2026

Related benchmarks

TaskDatasetResultRank
Document Question AnsweringSlideVQA (test)
EM55.57
19
Multimodal Question AnsweringMULTIMODALQADoc
EM57.78
12
Open-domain multimodal component retrievalMULTIMODALQADoc
R@133.59
12
Multimodal RetrievalMULTIMODALQA Doc (test)
Total Time (ms)1.92e+4
10
End-to-end Question AnsweringMP-DocVQA (test)
EM65.48
7
End-to-end Question AnsweringInfoVQA (test)
EM60.91
7
End-to-end Question AnsweringMultimodalQA (test)
EM44.57
7
End-to-end Question AnsweringMMCoQA (test)
EM36.31
7
RetrievalMP-DocVQA
R@383.59
6
RetrievalSlideVQA
R@392.81
6
Showing 10 of 16 rows

Other info

Follow for update