Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

About

We study fact-level repair for multimodal generation, where a fluent output may contain specific facts that are not supported by the input. Existing inference-time repair methods often generate feedback by jointly conditioning on the input and the current output. This design has two limitations: hallucinated claims in the output can bias the model's interpretation of the input, and free-form feedback cannot be ranked or scheduled at the fact level. We present TIGER, an inference-time framework that redesigns feedback for localized repair. TIGER independently extracts an observation graph from the input and a claim graph from the current output, then assigns each claim a graph-conditioned risk score based on support and conflict. The model repairs selected high-risk claims while keeping the backbone frozen. We provide a convergence analysis showing that the expected total risk decreases geometrically to an explicit asymptotic bound under mild assumptions. Experiments across four cross-modal paths, including image-to-text, image+text-to-text, audio-to-text, and video-to-text, show that TIGER reduces unsupported content while preserving task quality. The gains hold across multiple backbones, and a CrisisFACTS case study suggests that the same repair mechanism can improve grounding in multi-source settings.

Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins, Yushun Dong, Amanda Hughes• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination EvaluationAMBER--
222
Image-to-textCOCO
CHAIRs0.05
31
Image+Text-to-Text Hallucination EvaluationMMHal-Bench
BERT Score79
18
Image-to-Text Hallucination EvaluationCOCO
CHAIRs Score0.05
18
Video-to-TextVideoHallucer (test)
Hallucination Rate1
6
Grounded Situational SummarizationCrisisFACTS
Hurricane Precision100
2
Showing 6 of 6 rows

Other info

Follow for update