Large Language Models are few(1)-shot Table Reasoners

About

Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform table-related tasks with few-shot in-context learning. Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus. When combined with `chain of thoughts' prompting, LLMs can achieve very strong performance with only a 1-shot demonstration, even on par with some SoTA models. We show that LLMs are even more competent at generating comprehensive long-form answers on FetaQA than tuned T5-large. We further manually studied the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the underlying semantic form. We believe that LLMs can serve as a simple yet generic baseline for future research. The code and data are released in https://github.com/wenhuchen/TableCoT.

Wenhu Chen• 2022

Related benchmarks

Task	Dataset	Result
Table Question Answering	WikiTQ (test)	Accuracy52.4	140
Table Question Answering	WikiTableQuestions (test)	--	86
Fact Verification	TabFact	Accuracy92.1	83
Table Fact Verification	TabFact small (test)	Accuracy0.7861	57
Table-based Fact Verification	TabFact	Accuracy73.1	49
Table Fact Verification	TabFact small	Overall Accuracy78.61	18
Table Fact Verification	TabFact full (test)	Simple Accuracy84.36	16
Table Fact Verification	TabFact full	Simple Accuracy84.36	16
Table Question Answering	WikiTQ Large (>4000 tokens)	Accuracy35.1	8
Free-form Table Question Answering	FeTaQA (100 randomly chosen samples)	Fluency0.96	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord