Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences

About

We study the problem of optimizing biological sequences, e.g., proteins, DNA, and RNA, to maximize a black-box score function that is only evaluated in an offline dataset. We propose a novel solution, bootstrapped training of score-conditioned generator (BootGen) algorithm. Our algorithm repeats a two-stage process. In the first stage, our algorithm trains the biological sequence generator with rank-based weights to enhance the accuracy of sequence generation based on high scores. The subsequent stage involves bootstrapping, which augments the training dataset with self-generated data labeled by a proxy score function. Our key idea is to align the score-based generation with a proxy score function, which distills the knowledge of the proxy score function to the generator. After training, we aggregate samples from multiple bootstrapped generators and proxies to produce a diverse design. Extensive experiments show that our method outperforms competitive baselines on biological sequential design tasks. We provide reproducible source code: \href{https://github.com/kaist-silab/bootgen}{https://github.com/kaist-silab/bootgen}.

Minsu Kim, Federico Berto, Sungsoo Ahn, Jinkyoo Park• 2023

Related benchmarks

TaskDatasetResultRank
Offline Model-Based OptimizationUTR
90th Percentile Oracle Score7.74
17
Offline Model-Based OptimizationD'Kitty
Oracle Score (90th Pctl)0.62
17
Offline Model-Based OptimizationGFP
90th Percentile Oracle Score3.6
17
Offline Model-Based OptimizationChEMBL
90th Percentile Oracle Score0.61
17
Offline Model-Based OptimizationTF Bind 8
90th Percentile Oracle Score38.8
17
Model-Based OptimizationDesign-Bench 2022 (test)
TF-Bind-8 Score0.979
16
Model-Based OptimizationDesign-Bench
LogP-13
16
Offline Model-Based OptimizationLogP
90th Percentile Oracle Score-116.8
16
Offline Model-Based OptimizationWarfarin
90th Percentile Oracle Score549
15
Showing 9 of 9 rows

Other info

Follow for update