Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

About

Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as ICL demonstrations increase from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce \textit{DrICL}, a novel optimization method that enhances model performance through \textit{Differentiated} and \textit{Reweighting} objectives. Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the \textit{Many-Shot ICL Benchmark} (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for both fine-tuning and evaluation purposes. Experimental results demonstrate that LLMs enhanced with DrICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and dataset hoping to facilitate further research in many-shot ICL\footnote{https://github.com/xiaoqzhwhu/DrICL}.

Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Jian Luan, Shuo Shang, Xiuying Chen, Rui Yan• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringOpenBookQA
Accuracy80
465
Question AnsweringARC
Accuracy81
154
ClusteringCLSClusteringS2S
Accuracy89
68
Sentiment ExtractionTweetSentimentExtraction
Accuracy0.83
60
Text ClusteringCLSClusteringS2S id (test)
Accuracy88
44
Text ClusteringArxivClusteringS2S ood (test)
Accuracy42
44
Mathematical ReasoningGSM8K (test)
Accuracy32
24
RetrievalEcomRetrieval in-domain (test)
Accuracy94
16
SummarizationXSUM in-domain (test)
D3 Score20
16
RetrievalVideoRetrieval out-of-domain (test)
Accuracy100
16
Showing 10 of 14 rows

Other info

Code

Follow for update