Table as a Modality for Large Language Models

About

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.

Liyao Li, Chao Ye, Wentao Ye, Yifei Sun, Zhe Jiang, Haobo Wang, Jiaming Tian, Yiming Zhang, Ningtao Wang, Xing Fu, Gang Chen, Junbo Zhao• 2025

Related benchmarks

Task	Dataset	Result
Table Question Answering	WikiTQ	Accuracy56.93	149
Table Question Answering	HiTab	Accuracy73.73	121
Table Question Answering	WikiSQL	Accuracy85.9	47
Free-form QA	FeTaQA	BLEU39.01	16
Structural QA	StructQA	Accuracy78	15
Table Question Answering	MultiTabQA geoQuery	Precision49.36	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord