GAUCHE: A Library for Gaussian Processes in Chemistry
About
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Virtual Screening | 100 Proteins | Median Percent Lift (Avg Top-10)31 | 45 | |
| Protein hit discovery | ChEMBL Proteins | Win % (Avg Top-10, PIC 7.0)9 | 20 | |
| Ligand discovery | 100 Proteins Min Top-3 endpoint | MLT (Target PIC 7.0)13 | 11 | |
| Ligand discovery | 100 Proteins Average Top-10 endpoint | MLT Score (PIC 7.0)18 | 11 | |
| Ligand scoring | 1,000,000 ligands | Inference Time (s)39 | 4 |