Learning to Select In-Context Demonstration Preferred by Large Language Model

About

In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks during inference using only a few demonstrations. However, ICL performance is highly dependent on the selection of these demonstrations. Recent work explores retrieval-based methods for selecting query-specific demonstrations, but these approaches often rely on surrogate objectives such as metric learning, failing to directly optimize ICL performance. Consequently, they struggle to identify truly beneficial demonstrations. Moreover, their discriminative retrieval paradigm is ineffective when the candidate pool lacks sufficient high-quality demonstrations. To address these challenges, we propose GenICL, a novel generative preference learning framework that leverages LLM feedback to directly optimize demonstration selection for ICL. Experiments on 19 datasets across 11 task categories demonstrate that GenICL achieves superior performance than existing methods in selecting the most effective demonstrations, leading to better ICL performance.

Zheng Zhang, Shaocheng Lan, Lei Song, Jiang Bian, Yexin Li, Kan Ren• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy74.6	1896
Mathematical Reasoning	MATH	Accuracy88.4	882
Natural Language Inference	RTE	Accuracy72.9	590
Reading Comprehension	BoolQ	Accuracy78.1	279
Question Answering	GPQA	Accuracy65.2	258
Common Sense Reasoning	COPA	Accuracy86	256
Topic Classification	AG-News	Accuracy92.6	225
Natural Language Inference	SNLI	Accuracy84.6	196
Sentiment Analysis	SST-2	Accuracy95	165
Sentiment Analysis	SST-2 (test)	Accuracy94.6	144

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord