Deep Kernel Learning
About
We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$ for $n$ training points, and predictions cost $O(1)$ per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| GP regression | Kernel Cookbook 1.0 (test) | MSE2.52e-4 | 35 | |
| Regression | elevators (test) | RMSE0.084 | 19 | |
| Zero-shot performance prediction | UDPOS | MAE6.02 | 18 | |
| Zero-shot performance prediction | XNLI | MAE2.16 | 18 | |
| Zero-shot performance prediction | WikiAnn | MAE11.51 | 18 | |
| Few-shot regression | Periodic functions in-range (test) | MSE2.08 | 10 | |
| Regression | Protein (test) | RMSE0.46 | 10 | |
| Zero-shot performance prediction | Tatoeba | MAE6.38 | 9 | |
| Performance Prediction | PAWS | MAE1.27 | 9 | |
| Performance Prediction | XQuAD | MAE4.13 | 9 |