Are Random Decompositions all we need in High Dimensional Bayesian Optimisation?
About
Learning decompositions of expensive-to-evaluate black-box functions promises to scale Bayesian optimisation (BO) to high-dimensional problems. However, the success of these techniques depends on finding proper decompositions that accurately represent the black-box. While previous works learn those decompositions based on data, we investigate data-independent decomposition sampling rules in this paper. We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally across the search space. Then, we formally show that a random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate as provided by the decomposition. Those results motivate the development of the random decomposition upper-confidence bound algorithm (RDUCB) that is straightforward to implement - (almost) plug-and-play - and, surprisingly, yields significant empirical gains compared to the previous state-of-the-art on a comprehensive set of benchmarks. We also confirm the plug-and-play nature of our modelling component by integrating our method with HEBO, showing improved practical gains in the highest dimensional tasks from Bayesmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| High-dimensional optimization | MSLR | Convergence Value-8.8035 | 21 | |
| High-dimensional optimization | Lasso-Hard | Convergence Value11.6843 | 20 | |
| High-dimensional optimization | LIMO | Convergence Value-4.075 | 20 | |
| Function Optimization | Rosenbrock D=1000 | Convergence Value9.16e+5 | 19 | |
| Function Optimization | Sphere D=1000 | Final Value174.8 | 19 | |
| Function Optimization | Levy D=1000 | Convergence Value198.5 | 19 | |
| Function Optimization | Dixon D=1000 | Convergence Value1.57e+6 | 19 | |
| Function Optimization | Michalewicz D=1000 | Convergence Value-6.4992 | 19 | |
| Function Optimization | Griewank D=1000 | Convergence Value (Statistic)164.7 | 19 |