Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design
About
When reliable target structures are unavailable at scale or phenotypes arise from dysregulated pathways, transcriptomic perturbations provide a system-level functional readout for drug action. In this work, we formalize \emph{Transcriptome-based Drug Design (TBDD)} as a generative inverse problem: designing drug molecules conditioned on desired transcriptomic state transitions. We analyze the inherently ill-posed nature of this task, which is further complicated by the profound domain gap between biology and chemistry and by the sparsity of transcriptomic signals. To address these challenges, we propose \textbf{\themodel{}} (A \textbf{C}ell\textbf{U}lar \textbf{R}esponse \textbf{E}ngine), a multi-resolution transcriptome-guided diffusion framework. \themodel{} features a specialized \textbf{Transcriptome Perturbation Functional Feature Extractor (TFE)} that (1) distills function-oriented perturbation embeddings from pre/post states, (2) aligns these signatures to dual chemical views to bridge the cross-modal gap, and (3) performs heterogeneity-aware aggregation to extract robust state-specific signals from noisy transcriptomic data. Extensive evaluations on both standard benchmarks and rigorous out-of-distribution protocols demonstrate that \themodel{} consistently outperforms strong baselines in structural quality and functional consistency. Furthermore, we validate its practical utility via a zero-shot gene-inhibitor design task, highlighting the potential of phenotype-driven generative discovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Zero-shot Gene Inhibitor Prediction | 10 canonical genes inhibitor libraries (Unseen gene knockout) | Morgan Fingerprint Similarity88.1 | 40 | |
| Toxicity Property Prediction | L1000 (test) | Ames Mutagenicity48.5 | 5 | |
| Molecular Generation | L1000 Bulk In-Distribution | Coverage100 | 4 | |
| Molecular Generation | L1000 Bulk Out-of-Distribution - Unseen Cells | Coverage90.9 | 4 | |
| Molecular Generation | L1000 Bulk Out-of-Distribution - Unseen Drugs | Coverage90.9 | 4 | |
| Molecular Generation | Tahoe Single-cell In-Distribution 100M | Coverage90.9 | 4 | |
| Molecular Generation | Tahoe-100M Single-cell Out-of-Distribution - Unseen Cells | Coverage90.9 | 4 | |
| Molecular Generation | Tahoe-100M Single-cell (Out-of-Distribution - Unseen Drugs) | Coverage90.9 | 4 | |
| Molecule Generation | Bulk in-distribution | Unique Scaffold Ratio63.6 | 4 | |
| Binding affinity prediction | HDAC1 PDB 4BKX | Binding Affinity8.68 | 3 |