HIPPO: Enhancing the Table Understanding Capability of LLMs through Hybrid-Modal Preference Optimization

About

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. Recent methods employ Multi-modal Large Language Models (MLLMs) to address table-related tasks across various modalities of table representations. However, existing studies mainly focus on exploring the table understanding ability of MLLMs using unimodal representations, which limits further exploration of multi-modal representations to enable more effective table reasoning. To better capture structural semantics from the tabular data, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, optimizing MLLMs by learning more comprehensive table information from these multiple modalities. Specifically, HIPPO samples MLLM responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during Direct Preference Optimization (DPO) training. Experiments on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances the table reasoning capability based on unimodal representations but also facilitates the extraction of complementary semantics across modalities. The code is available at https://github.com/NEUIR/HIPPO.

Haolan Wang, Zhenghao Liu, Xinze Li, Xiaocui Yang, Yu Gu, Yukun Yan, Qi Shi, Fangfang Li, Chong Chen, Ge Yu• 2025

Related benchmarks

Task	Dataset	Result
Text-based Visual Question Answering	TextVQA	Accuracy75.89	984
Science Question Answering	ScienceQA	Accuracy96.33	916
Visual Hallucination Evaluation	HallusionBench	Accuracy63.41	156
Table Fact Verification	TabFact (test)	Accuracy82.27	146
Table Fact Verification	TabFact	Accuracy0.8251	110
Table Question Answering	WTQ	Accuracy62.73	109
Table Question Answering	TabMWP	Accuracy87.5	97
Table Question Answering	TAT-QA	Accuracy62.82	45
Table Fact Verification	InfoTabs	Accuracy78.22	45
Table Question Answering	HiTab	Accuracy63	30

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord