Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes

About

Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, for example when some of the input variables take categorical or integer values, one has to introduce extra approximations. Consider a suggested input location taking values in the real line. Before doing the evaluation of the objective, a common approach is to use a one hot encoding approximation for categorical variables, or to round to the closest integer, in the case of integer-valued variables. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are categorical or integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods using Gaussian processes on problems with categorical or integer-valued variables.

Eduardo C. Garrido-Merch\'an, Daniel Hern\'andez-Lobato• 2018

Related benchmarks

Task	Dataset	Result
Black-box Optimization	Butternut Squash (BS) function variants strict tolerance	Mean Rank6.6	19
Black-box optimization ranking	BS function 20 variants loose tolerance	Mean Rank6.9	19
Composite score ranking	Butternut Squash function variants discrete domains medium tolerance	Mean Rank6.3	19

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord