HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts
About
Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input datasets (versus 0% for general agents) while matching or even exceeding expert-designed baselines in rigorous out-of-distribution evaluations. This dual-track orchestration enables scalable automatic virtual cell modeling without dataset-specific engineering.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unseen Perturbation Prediction | Norman 2019 (OOD) | CosLogFC0.61 | 4 | |
| Unseen Cell Prediction | Srivatsan-Sciplex3 2020 (OOD) | CosLogFC0.1 | 4 | |
| Unseen Perturbation Prediction | Adamson 2016 (OOD) | CosLogFC0.32 | 4 | |
| Unseen Perturbation Prediction | Srivatsan-Sciplex2 2020 (OOD) | CosLogFC0.06 | 4 | |
| Virtual Cell Modeling | 20 virtual cell modeling trials | Preprocess Error0.00e+0 | 3 |