Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

About

Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible classifier-guided conditional generation. We address these challenges through two primary contributions. First, we demonstrate that discrete diffusion fine-tuning achieves strong language modeling performance on antibody sequences while allowing for generation conditioned on any off-the-shelf classifier. Second, we introduce germline absorbing diffusion, a novel modification of the discrete diffusion noise process in which the germline sequence - rather than a masked sequence - serves as the absorbing state. This biologically motivated inductive bias restricts the model to learning the trajectory from germline to observed sequence, effectively excluding genetic variation and V(D)J recombination statistics from the learned distribution and dramatically mitigating germline bias. We show that germline diffusion improves non-germline residue prediction accuracy from 26 percent to 46 percent, approaching the theoretical upper bound set by true biological variability. We then demonstrate the utility of our germline diffusion model on the conditional generation tasks of sampling antibodies with improved hydrophobicity and predicted binding affinity. On both tasks our model shows an improved tradeoff between class adherence and sample quality, significantly outperforming EvoProtGrad, a popular strategy to sample from pLMs with gradient-based discrete Markov Chain Monte Carlo.

Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez, Nina Cheng, Melih Yilmaz• 2026

Related benchmarks

TaskDatasetResultRank
Hydrophobicity conditional generationOAS
Energy (kcal/mol)-0.063
13
HGFR Binding conditional generationHGFR Binding
p(bind) Classification Score0.8
9
V-gene class conditional generationOAS
Class Adherence84
9
Non-germline residue predictionOAS held-out germlines (test)
Non-germline Accuracy46.3
9
Language ModelingOAS held-out germlines (test)
Perplexity1.293
7
Showing 5 of 5 rows

Other info

Follow for update