Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering

About

The task of zero-shot commonsense question answering evaluates models on their capacity to reason about general scenarios beyond those presented in specific datasets. Existing approaches for tackling this task leverage external knowledge from CommonSense Knowledge Bases (CSKBs) by pretraining the model on synthetic QA pairs constructed from CSKBs. In these approaches, negative examples (distractors) are formulated by randomly sampling from CSKBs using fairly primitive keyword constraints. However, two bottlenecks limit these approaches: the inherent incompleteness of CSKBs limits the semantic coverage of synthetic QA pairs, and the lack of human annotations makes the sampled negative examples potentially uninformative and contradictory. To tackle these limitations above, we propose Conceptualization-Augmented Reasoner (CAR), a zero-shot commonsense question-answering framework that fully leverages the power of conceptualization. Specifically, CAR abstracts a commonsense knowledge triple to many higher-level instances, which increases the coverage of CSKB and expands the ground-truth answer space, reducing the likelihood of selecting false-negative distractors. Extensive experiments demonstrate that CAR more robustly generalizes to answering questions about zero-shot commonsense scenarios than existing methods, including large language models, such as GPT3.5 and ChatGPT. Our codes, data, and model checkpoints are available at https://github.com/HKUST-KnowComp/CAR.

Weiqi Wang, Tianqing Fang, Wenxuan Ding, Baixuan Xu, Xin Liu, Yangqiu Song, Antoine Bosselut• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy78.2
776
Physical Interaction Question AnsweringPIQA
Accuracy78.6
323
Physical Commonsense ReasoningPIQA (val)
Accuracy78.6
113
Social Interaction Question AnsweringSIQA
Accuracy64.8
85
Abductive Natural Language InferenceaNLI (leaderboard)
Accuracy79.6
47
Commonsense Question AnsweringSocialIQA (SIQA) (val)
Accuracy64
24
Commonsense Question AnsweringCommonsenseQA (CSQA) (val)
Accuracy69.3
23
Commonsense Question AnsweringAbductive NLI (aNLI) (val)
Accuracy0.796
21
Commonsense Question AnsweringWinoGrande (WG) (val)
Accuracy78.2
21
Showing 9 of 9 rows

Other info

Follow for update