Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BioBO: Biology-informed Bayesian Optimization for Perturbation Design

About

Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.

Yanke Li, Tianyu Cui, Tommaso Mansi, Mangal Prakash, Rui Liao• 2025

Related benchmarks

TaskDatasetResultRank
IFN-γ Phenotype OptimizationFusion
Cumulative Top-k Recall10.9
20
IFN-γ Phenotype OptimizationAchilles
Cumulative top-k Recall9.8
20
IFN-γ Phenotype OptimizationGenePT
Cumulative Top-k Recall10.1
20
IFN-γ Phenotype OptimizationGene2Vec
Cumulative top-k Recall10.3
20
IL-2 Phenotype OptimizationFusion
Cumulative Top-k Recall17.8
20
IL-2 Phenotype OptimizationAchilles
Cumulative Top-k Recall16.3
20
IL-2 Phenotype OptimizationGenePT
Cumulative Recall@k13.9
20
IL-2 Phenotype OptimizationGene2Vec
Cumulative top-k Recall13.3
20
Enrichment analysisIL-2 immune-cell CRISPR perturbation dataset (test)
Overlap179
8
Enrichment analysisIFN-gamma phenotype dataset
Overlap Count187
8
Showing 10 of 10 rows

Other info

Follow for update