In-Context Learning for Pure Exploration in Continuous Spaces

About

In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $\epsilon$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

Alessio Russo, Yin-Ching Lee, Ryan Welch, Aldo Pacchiano• 2026

Related benchmarks

Task	Dataset	Result
Best Arm Identification	ε-Best-Arm Problem 10D, ε=0.1	Correctness91.5	5
Best Arm Identification	ε-Best-Arm Problem 10D, ε=0.2	Correctness90.5	5
Pure Exploration	Ackley Function 4D ε=0.1	Correctness89.5	5
Pure Exploration	Ackley Function 5D ε=0.1	Correctness90.2	5
Pure Exploration	Ackley Function 3D ε=0.2	Correctness0.924	5
Pure Exploration	Ackley Function 4D ε=0.2	Correctness92.1	5
Pure Exploration	Ackley Function 5D ε=0.2	Correctness87.3	5
Best Arm Identification	ε-Best-Arm Problem 6D, ε=0.1	Correctness90.4	5
Best Arm Identification	ε-Best-Arm Problem 8D, ε=0.1	Correctness91.7	5
Best Arm Identification	ε-Best-Arm Problem 6D, ε=0.2	Correctness91.4	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord