Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering

About

Extractive question answering (QA) models tend to exploit spurious correlations to make predictions when a training set has unintended biases. This tendency results in models not being generalizable to examples where the correlations do not hold. Determining the spurious correlations QA models can exploit is crucial in building generalizable QA models in real-world applications; moreover, a method needs to be developed that prevents these models from learning the spurious correlations even when a training set is biased. In this study, we discovered that the relative position of an answer, which is defined as the relative distance from an answer span to the closest question-context overlap word, can be exploited by QA models as superficial cues for making predictions. Specifically, we find that when the relative positions in a training set are biased, the performance on examples with relative positions unseen during training is significantly degraded. To mitigate the performance degradation for unseen relative positions, we propose an ensemble-based debiasing method that does not require prior knowledge about the distribution of relative positions. We demonstrate that the proposed method mitigates the models' reliance on relative positions using the biased and full SQuAD dataset. We hope that this study can help enhance the generalization ability of QA models in real-world applications.

Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa• 2022

Related benchmarks

Task	Dataset	Result
Conversational Question Generation	CoQAR Non-Biased	Performance (%)18.3	6
Conversational Question Generation	CoQAR Biased	Performance23.2	6
Conversational Question Generation	CANARD Biased	Performance24.6	6
Conversational Question Generation	CANARD Non-Biased	CQG Non-Biased Performance21.2	6
Summarization	Newsroom Non-Biased	ROUGE-L22	5
Knowledge-Grounded Conversation	Mutual Biased	Performance89.6	5
Knowledge-Grounded Conversation	Mutual Non-Biased	Performance46.1	5
Summarization	CNN/DM Biased	ROUGE-L23.9	5
Summarization	CNN/DM Non-Biased	ROUGE-L0.151	5
Summarization	Newsroom Biased	ROUGE-L47.5	5

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord