Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment

About

Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similarity-based techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2Text-V2, a novel multimodal sequence-to-text model that generates free-form natural language descriptions of protein function directly from amino acid sequences. Our method combines a protein language model as a sequence encoder (ESM-3B) and a decoder-only language model (LLaMA-3.1-8B-Instruct) through a lightweight nonlinear modality projector. A key innovation is our Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE), which improves cross-modal learning by matching mean- and std-pooled protein embeddings with text representations via contrastive loss. After the alignment phase, we apply instruction-based fine-tuning using LoRA on the decoder to teach the model how to generate accurate protein function descriptions conditioned on the protein sequence. We train Prot2Text-V2 on about 250K curated entries from SwissProt and evaluate it under low-homology conditions, where test sequences have low similarity with training samples. Prot2Text-V2 consistently outperforms traditional and LLM-based baselines across various metrics.

Xiao Fei, Michail Chatzianastasis, Sarah Almeida Carneiro, Hadi Abdine, Lawrence P. Petalidis, Michalis Vazirgiannis• 2025

Related benchmarks

TaskDatasetResultRank
Drug-Target Interaction PredictionBIOSNAP
Accuracy0.553
28
Molecule-Protein InteractionBindingDB
Accuracy59.2
13
CYP Substrate PredictionTDC CYP Substrate
CYP2C9 Accuracy54.5
13
CYP Inhibition PredictionTDC CYP Inhibition
Accuracy (CYP1A2)59.4
13
Molecule-Cell InteractionDrugComb
Accuracy65.6
13
Molecule-Cell InteractionGDSC 2
Accuracy59.7
13
Molecule-Protein InteractionHuman
Accuracy47.2
13
Showing 7 of 7 rows

Other info

Follow for update