Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

About

Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input datasets (versus 0% for general agents) while matching or even exceeding expert-designed baselines in rigorous out-of-distribution evaluations. This dual-track orchestration enables scalable automatic virtual cell modeling without dataset-specific engineering.

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun• 2026

Related benchmarks

TaskDatasetResultRank
Unseen Perturbation PredictionNorman 2019 (OOD)
CosLogFC0.61
4
Unseen Cell PredictionSrivatsan-Sciplex3 2020 (OOD)
CosLogFC0.1
4
Unseen Perturbation PredictionAdamson 2016 (OOD)
CosLogFC0.32
4
Unseen Perturbation PredictionSrivatsan-Sciplex2 2020 (OOD)
CosLogFC0.06
4
Virtual Cell Modeling20 virtual cell modeling trials
Preprocess Error0.00e+0
3
Showing 5 of 5 rows

Other info

Follow for update