UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy

About

In-context learning (ICL) enables fast task adaptation from demonstrations without per-task parameter updates but remains highly sensitive to example selection and formatting. In unified multimodal models spanning understanding and generation, this sensitivity is exacerbated by cross-modal interference and varying cognitive demands. Consequently, in-context learning efficacy is often non-monotonic and highly task-dependent. To diagnose these behaviors, we introduce a six-level Capability-Oriented Taxonomy that categorizes the functional role of demonstrations from basic perception to high-order discernment. Guided by this cognitive framework, we construct UniICL-760K, a large-scale corpus featuring curated 8-shot in-context learning episodes across 15 subtasks, alongside UniICL-Bench for rigorous, controlled evaluation. We show that this data-driven assembly is the primary source of our gains. As a complementary, lightweight stabilizer, we additionally propose the Context-Adaptive Prototype Modulator, a plug-and-play module that further improves few-shot stability. Evaluations on UniICL-Bench show that our approach yields highly competitive unified results, outperforming larger-parameter multimodal large language model baselines on most understanding in-context learning tasks. Data and code are available at https://github.com/xuyicheng-zju/UniICL.

Yicheng Xu, Jiangning Zhang, Zhucun Xue, Teng Hu, Ran Yi, Xiaobin Hu, Yong Liu, Dacheng Tao• 2026

Related benchmarks

Task	Dataset	Result
Understanding	UniICL-Bench	Perception Score80.9	33
Generation	UniICL-Bench	Perception86.5	15
In-Context Learning Stability Analysis	UniICL-Bench (test)	Random Replace Error (Und.)2.1	4
Image Generation	Generation-side benchmark context Nexus-Gen-V2 (350 episodes)	Semantic Intent Win Rate64.7	1

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord