Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HIPPO: Enhancing the Table Understanding Capability of LLMs through Hybrid-Modal Preference Optimization

About

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. Recent methods employ Multi-modal Large Language Models (MLLMs) to address table-related tasks across various modalities of table representations. However, existing studies mainly focus on exploring the table understanding ability of MLLMs using unimodal representations, which limits further exploration of multi-modal representations to enable more effective table reasoning. To better capture structural semantics from the tabular data, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, optimizing MLLMs by learning more comprehensive table information from these multiple modalities. Specifically, HIPPO samples MLLM responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during Direct Preference Optimization (DPO) training. Experiments on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances the table reasoning capability based on unimodal representations but also facilitates the extraction of complementary semantics across modalities. The code is available at https://github.com/NEUIR/HIPPO.

Haolan Wang, Zhenghao Liu, Xinze Li, Xiaocui Yang, Yu Gu, Yukun Yan, Qi Shi, Fangfang Li, Chong Chen, Ge Yu• 2025

Related benchmarks

TaskDatasetResultRank
Text-based Visual Question AnsweringTextVQA
Accuracy75.89
496
Science Question AnsweringScienceQA
Accuracy96.33
229
Table Question AnsweringWTQ
Accuracy62.73
101
Table Fact VerificationTabFact (test)
Accuracy82.27
98
Table Question AnsweringTabMWP
Accuracy87.5
53
Table Fact VerificationTabFact
Accuracy0.8251
36
Table Fact VerificationInfoTabs
Average Score65.46
30
Table Question AnsweringTAT-QA
Average Score65.46
30
Table Question AnsweringHiTab
Accuracy63
30
Hallucination EvaluationCRPE relation
Accuracy76.37
23
Showing 10 of 18 rows

Other info

Follow for update