Monte Carlo Tree Diffusion with Multiple Experts for Protein Design
About
The goal of protein design is to generate amino acid sequences that fold into functional structures with desired properties. Prior methods combining autoregressive language models with Monte Carlo Tree Search (MCTS) struggle with long-range dependencies and suffer from an impractically large search space. We propose MCTD-ME, Monte Carlo Tree Diffusion with Multiple Experts, which integrates masked diffusion models with tree search to enable multi-token planning and efficient exploration under the guidance of multiple experts. Unlike autoregressive planners, MCTD-ME uses biophysical-fidelity-enhanced diffusion denoising as the rollout engine, jointly revising multiple positions and scaling to large sequence spaces. It further leverages experts of varying capacities to enrich exploration, guided by a pLDDT-based masking schedule that targets low-confidence regions while preserving reliable residues. We propose a novel multi-expert selection rule ( PH-UCT-ME) extends Shannon-entropy-based UCT to expert ensembles with mutual information. MCTD-ME achieves superior performance on the CAMEO and PDB benchmarks, excelling in protein design tasks such as inverse folding, folding, and conditional design challenges like motif scaffolding on lead optimization tasks. Our framework is model-agnostic, plug-and-play, and extensible to denovo protein engineering and beyond.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Lead Optimization in Protein Folding | CAMEO 183 targets 2022 | RMSD (Base)10.5 | 8 | |
| Lead Optimization in Protein Folding | PDB date-split (449 targets) | RMSD (Base)8.45 | 8 | |
| Inverse folding | CAMEO benchmark 2022 | AAR46.67 | 7 | |
| Inverse folding | PDB (date-split) | AAR52.44 | 7 | |
| Forward folding | CAMEO subset (n=163) | TM-score0.732 | 4 | |
| Motif Scaffolding | 24 curated motifs (summary) | -- | 4 |