Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Representation-Guided Discrete Molecular Graph Retrosynthesis

About

Stochastic process-based molecular graph generators have become the state of the art for template-free single-step retrosynthesis. However, these models are typically trained only on product-reactant pairs, thereby acquiring chemistry-relevant representations in an indirect and implicit manner. Meanwhile, recent advances in computer vision demonstrate that offering representation guidance to a generator can effectively distill semantics from pretrained encoders into DiTs, substantially improving both convergence and generation quality. Whether similar gains extend to the retrosynthesis task, and what graph-specific design choices can make them work, remains an open question. To address these questions, we conduct a systematic empirical study over a unified design space spanning teacher molecular representations, endpoint and granularity choices, injection depths in the denoiser, correspondence strategies and guidance scheme. Guided by these considerations, we develop Graph-oriented Representation Guidance (GRG), which achieves 58.6 / 77.2 / 83.4 / 87.1 top-1 / 3 / 5 / 10 accuracy on USPTO-50k, while increasing diversity to 15.5, both substantially outperforming the adopted base generator. Notably, GRG consistently improves all top-k metrics in out-of-distribution settings, suggesting that representation guidance facilitates the acquisition of intrinsic chemical semantics. Meanwhile, the introduced representation guidance reduces the number of epochs by 35% and the wall-clock time by 30% to reach comparable performance. In addition, we introduce a simple yet effective representation-similarity-based reranking mechanism, which further improves the top of the ranked list without training an additional verifier.

Jiahai Huang, Anjie Qiao, Zhen Wang, Defu Lian, Yutong Lu• 2026

Related benchmarks

TaskDatasetResultRank
Molecular Graph RetrosynthesisUSPTO-50k (test)
Top-1 Accuracy58.6
7
RetrosynthesisUSPTO-50k standard (test)
Coverage k=192.1
4
RetrosynthesisUSPTO-50k CLUSTER (test)
Top-1 Accuracy55.3
3
Showing 3 of 3 rows

Other info

Follow for update