Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Conditioning by adaptive sampling for robust design

About

We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

David H. Brookes, Hahnbeom Park, Jennifer Listgarten• 2019

Related benchmarks

TaskDatasetResultRank
Discrete OptimizationTF Bind 10
Median Normalized Score0.463
16
Neural Architecture SearchNAS
Median Normalized Score0.292
16
Offline Model-Based OptimizationAnt Morphology (test)
Median Normalized Score0.384
16
Offline Model-Based OptimizationD'Kitty Morphology (test)
Median Normalized Score0.753
16
Discrete OptimizationTF Bind 8
Median Normalized Score42.8
16
Offline Model-Based OptimizationHopper Controller (test)
Median Normalized Score0.015
16
Offline Model-Based OptimizationSuperconductor (test)
Median Normalized Score0.111
16
Offline Model-Based OptimizationDesign-bench 100th percentile v1 (test)
GFP Score3.408
7
Protein Stability DesignACE2
ddG Mean-4.31
7
Offline Model-Based OptimizationDesign-bench (test)
GFP Score3.269
6
Showing 10 of 10 rows

Other info

Follow for update