In-Context Learning for Pure Exploration in Continuous Spaces
About
In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $\epsilon$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Best Arm Identification | ε-Best-Arm Problem 10D, ε=0.1 | Correctness91.5 | 5 | |
| Best Arm Identification | ε-Best-Arm Problem 10D, ε=0.2 | Correctness90.5 | 5 | |
| Pure Exploration | Ackley Function 4D ε=0.1 | Correctness89.5 | 5 | |
| Pure Exploration | Ackley Function 5D ε=0.1 | Correctness90.2 | 5 | |
| Pure Exploration | Ackley Function 3D ε=0.2 | Correctness0.924 | 5 | |
| Pure Exploration | Ackley Function 4D ε=0.2 | Correctness92.1 | 5 | |
| Pure Exploration | Ackley Function 5D ε=0.2 | Correctness87.3 | 5 | |
| Best Arm Identification | ε-Best-Arm Problem 6D, ε=0.1 | Correctness90.4 | 5 | |
| Best Arm Identification | ε-Best-Arm Problem 8D, ε=0.1 | Correctness91.7 | 5 | |
| Best Arm Identification | ε-Best-Arm Problem 6D, ε=0.2 | Correctness91.4 | 5 |