Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

About

While many-shot ICL achieves remarkable performance, prior studies of its scaling behavior have mainly focused on non-reasoning tasks. In this work, we study many-shot ICL on reasoning tasks, with a particular focus on many-shot chain-of-thought in-context learning (CoT-ICL). Analyzing across non-reasoning and reasoning tasks and across non-reasoning and reasoning-oriented LLMs, we identify several distinctive properties of many-shot CoT-ICL. We further interpret these findings by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggest two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on a math task with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.

Tsz Ting Chung, Lemao Liu, Mo Yu, Dit-Yan Yeung• 2026

Related benchmarks

Task	Dataset	Result
geometry proof generation	Geometry	Accuracy81.21	24
Logical reasoning	DetectiveQA	Accuracy (DetectiveQA)88.31	24
number theory problem solving	Number Theory	Accuracy92.59	24

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord