LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
About
Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers - each representing coarse and fine granularity - facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for efficient candidate generation, then performs fine-grained reasoning via late interaction. Extensive experiments demonstrate that LILaC achieves state-of-the-art retrieval performance on all five benchmarks, notably without additional fine-tuning. We make the artifacts publicly available at github.com/joohyung00/lilac.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Document Question Answering | SlideVQA (test) | EM55.57 | 19 | |
| Multimodal Question Answering | MULTIMODALQADoc | EM57.78 | 12 | |
| Open-domain multimodal component retrieval | MULTIMODALQADoc | R@133.59 | 12 | |
| Multimodal Retrieval | MULTIMODALQA Doc (test) | Total Time (ms)1.92e+4 | 10 | |
| End-to-end Question Answering | MP-DocVQA (test) | EM65.48 | 7 | |
| End-to-end Question Answering | InfoVQA (test) | EM60.91 | 7 | |
| End-to-end Question Answering | MultimodalQA (test) | EM44.57 | 7 | |
| End-to-end Question Answering | MMCoQA (test) | EM36.31 | 7 | |
| Retrieval | MP-DocVQA | R@383.59 | 6 | |
| Retrieval | SlideVQA | R@392.81 | 6 |