Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

About

Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce $\textbf{RigidSSL}$ ($\textit{Rigidity-Aware Self-Supervised Learning}$), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations. Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions. Underpinning both phases is a bi-directional, rigidity-aware flow matching objective that jointly optimizes translational and rotational dynamics to maximize mutual information between conformations. Empirically, RigidSSL variants improve designability by up to 43% while enhancing novelty and diversity in unconditional generation. Furthermore, RigidSSL-Perturb improves the success rate by 5.8% in zero-shot motif scaffolding and RigidSSL-MD captures more biophysically realistic conformational ensembles in G protein-coupled receptor modeling.

Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Sch\"olkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu• 2026

Related benchmarks

TaskDatasetResultRank
Protein Structure GenerationPDB
FPSD776.3
12
Protein Structure GenerationAFDB
FPSD701.7
12
Unconditional protein structure generationPDB
Fraction (scRMSD <= 2.0 A)87.5
12
GPCR conformational ensemble generationGPCR MD ensembles
Pairwise RMSD2.2
6
Protein Structure GenerationProtein Sequences 700-residue
Clashscore21.36
6
Protein Structure Generation800-residue protein sequences
Clashscore26.42
6
Showing 6 of 6 rows

Other info

Follow for update