AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design

About

Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX, AutoDock Vina), and ii) Contrastive Agent Policy Optimisation (CAPO), a trajectory-level extension of direct preference optimisation that trains the policy end-to-end to learn when oracle feedback is informative rather than merely imitating high-fitness sequences. We evaluate AgentPLM on benchmark tasks spanning de novo enzyme design, antibody optimisation, thermostability, PPI interface design, and zero-shot fitness prediction with standardised oracle APIs and controlled sequence-identity splits. AgentPLM achieves state-of-the-art results with a gain in antibody top-10% hit rate over the strongest passive baseline, providing mechanistic evidence of online error correction without explicit backtracking.

Sahil Rahman, Maxx Richard Rahman• 2026

Related benchmarks

Task	Dataset	Result
Antibody optimization	AntibodyOpt	Top-10% Hit Rate52.41	6
Enzyme Design	EnzyDes	Normalised kcat/Km Ratio1.89	6
Thermostability improvement	ThermoStab	Mean Tm Improvement (°C)7.64	6
Zero-shot fitness prediction	DMS	Spearman Correlation0.61	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord