Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

About

Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.

Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari• 2020

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy76	1581
Physical Commonsense Reasoning	PIQA	Accuracy78	724
Physical Interaction Question Answering	PIQA	Accuracy79	462
Social Interaction Question Answering	SIQA	Accuracy63.2	157
Physical Commonsense Reasoning	PIQA (val)	Accuracy79	118
Social Commonsense Reasoning	SIQA	Accuracy63.1	118
Commonsense Question Answering	CSQA	Accuracy67	71
Common Sense Reasoning	WG	Accuracy76	61
Abductive Commonsense Reasoning	ANLI (test)	Accuracy76	53
Abductive Natural Language Inference	aNLI (leaderboard)	Accuracy76	47

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord