Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

One protein is all you need

About

Generalization beyond training data remains a central challenge in machine learning for biology. A common way to enhance generalization is self-supervised pre-training on large datasets. However, aiming to perform well on all possible proteins can limit a model's capacity to excel on any specific one, whereas experimentalists typically need accurate predictions for individual proteins they study, often not covered in training data. To address this limitation, we propose a method that enables self-supervised customization of protein language models to one target protein at a time, on the fly, and without assuming any additional data. We show that our Protein Test-Time Training (ProteinTTT) method consistently enhances generalization across different models, their sizes, and datasets. ProteinTTT improves structure prediction for challenging targets, achieves new state-of-the-art results on protein fitness prediction, and enhances function prediction on two tasks. Through two challenging case studies, we also show that customization via ProteinTTT achieves more accurate antibody-antigen loop modeling and enhances 19% of structures in the Big Fantastic Virus Database, delivering improved predictions where general-purpose AlphaFold2 and ESMFold struggle.

Anton Bushuiev, Roman Bushuiev, Olga Pimenova, Nikola Zadorozhny, Raman Samusevich, Elisabet Manaskova, Rachel Seongeun Kim, Hannes St\"ark, Jiri Sedlar, Martin Steinegger, Tom\'a\v{s} Pluskal, Josef Sivic• 2024

Related benchmarks

TaskDatasetResultRank
Protein fitness predictionProteinGym (test)
Avg. Spearman Correlation0.5087
14
Fitness PredictionMaveDB subset of 50 proteins
Average Spearman Correlation0.5462
10
Protein Structure PredictionCAMEO 18 low-confidence targets (test)
TM-score0.5047
10
Subcellular Localization PredictionsetHard (test)
Accuracy63.4
2
TPS substrate classificationTPS dataset (cross-validation)
mAP81.1
2
Showing 5 of 5 rows

Other info

Follow for update