Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

About

Most protein families have fewer than 100 known members, a regime where deep generative models overfit or collapse. We propose stochastic attention (SA), a training-free sampler that treats the modern Hopfield energy over a protein alignment as a Boltzmann distribution and draws samples via Langevin dynamics. The score function is a closed-form softmax attention operation requiring no training, no pretraining data, and no GPU, with cost linear in alignment size. Across eight Pfam families, SA generates sequences with low amino acid compositional divergence, substantial novelty, and structural plausibility confirmed by ESMFold and AlphaFold2. Generated sequences fold more faithfully to canonical family structures than natural members in six of eight families. Against profile HMMs, EvoDiff, and the MSA Transformer, which produce sequences that drift far outside the family, SA maintains 51 to 66 percent identity while remaining novel, in seconds on a laptop. The critical temperature governing generation is predicted from PCA dimensionality alone, enabling fully automatic operation. Controls confirm SA encodes correlated substitution patterns, not just per-position amino acid frequencies.

Jeffrey D. Varner• 2026

Related benchmarks

TaskDatasetResultRank
Protein Sequence Pseudo-Perplexity EvaluationPfam Protein Families RRM, SH3, WW, Kunitz, zf-C2H2, PDZ, Pkinase, Defensin_beta
Pseudo-Perplexity4.44
16
Pairwise mutual information preservationKunitz
Pearson Correlation (r)0.8
6
Pairwise mutual information preservationSH3
Pearson Correlation (r)0.66
6
Sequence GenerationPfam RRM family
KL Divergence (AA)0.06
5
Sequence GenerationPfam WW family
KL Divergence (AA)0.008
5
Sequence GenerationPfam Kunitz family
KL Divergence (AA)0.013
5
Sequence GenerationPfam PDZ family
KL Divergence (AA)0.038
5
Sequence GenerationPfam Pkinase family
KL Divergence (AA)0.035
5
Pairwise mutual information preservationWW
Pearson Correlation (r)0.71
5
Pairwise mutual information preservationzf-C2H2
Pearson Correlation (r)0.92
5
Showing 10 of 19 rows

Other info

Follow for update