Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention

About

Many-shot in-context learning has recently shown promise as an alternative to finetuning, with the major advantage that the same model can be served for multiple tasks. However, this shifts the computational burden from training-time to inference-time, making deployment of many-shot ICL challenging to justify in-practice. This cost is further increased if a custom demonstration set is retrieved for each inference example. We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning. By combining carefully designed block-sparse attention and retrieval of cached groups of demonstrations, we achieve comparable per-example latency to finetuning while maintaining on average >95% of the best method's accuracy across strong ICL and finetuning baselines. We hope that this will further enable the deployment of many-shot ICL at scale.

Emily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch• 2025

Related benchmarks

Task	Dataset	Result
Text Classification	TREC	Accuracy95	281
Intent Classification	Banking77	Accuracy89	260
Question Answering	SQuAD 2.0	--	215
Natural Language Understanding	QNLI	Exact Match80.7	14
Natural Language Understanding	COLA	Exact Match65.7	14
Mathematical Reasoning	SVAMP	Pass@1 Accuracy73.7	14
Natural Language Understanding	PIQA	Exact Match40.4	12
Text Classification	TREC Fine	Accuracy0.88	8
Text Classification	Clinic	Accuracy90	8
Text Classification	NLU	Accuracy88	8

Showing 10 of 15 rows

Other info

Code

Follow for update

@wizwand_team Discord