Mitigating Label Biases for In-context Learning

About

Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias a model toward a particular prediction without being reflective of an understanding of the task. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias (which we conceptualize and detect for the first time). Our analysis demonstrates that prior label bias calibration methods fall short of addressing all three types of biases. Specifically, domain-label bias restricts LLMs to random-level performance on many tasks regardless of the choice of in-context examples. To mitigate the effect of these biases, we propose a simple bias calibration method that estimates a language model's label bias using random in-domain words from the task corpus. After controlling for this estimated bias when making predictions, our novel domain-context calibration significantly improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The gain is substantial on tasks with large domain-label bias (up to 37% in Macro-F1). Furthermore, our results generalize to models with different scales, pretraining methods, and manually-designed task instructions, showing the prevalence of label biases in ICL.

Yu Fei, Yifan Hou, Zeming Chen, Antoine Bosselut• 2023

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy51.81	881
Natural Language Inference	RTE	Accuracy66.21	590
Natural Language Understanding	GLUE (test)	--	416
Subjectivity Classification	Subj	Accuracy50.04	343
Text Classification	AG News (test)	--	293
Text Classification	TREC	Accuracy52.81	281
Question Classification	TREC	Accuracy80.5	262
Word Sense Disambiguation	WiC	Avg Accuracy52.4	261
Question Answering	ARC	Accuracy64.88	230
Topic Classification	AG-News	Accuracy89.34	225

Showing 10 of 98 rows

...

Other info

Code

Follow for update

@wizwand_team Discord