Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs

About

Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological contexts. In contrast, general-purpose Large Language Models (LLMs) lack the capability to interpret protein sequences and fall short in domain-specific knowledge, limiting their capacity for effective biosemantic reasoning. To combine the advantages of both, we propose BioBridge, a domain-adaptive continual pretraining framework for protein understanding. This framework employs Domain-Incremental Continual Pre-training (DICP) to infuse protein domain knowledge and general reasoning corpus into a LLM simultaneously, effectively mitigating catastrophic forgetting. Cross-modal alignment is achieved via a PLM-Projector-LLM pipeline, which maps protein sequence embeddings into the semantic space of the language model. Ultimately, an end-to-end optimization is adopted to uniformly support various tasks, including protein property prediction and knowledge question-answering. Our proposed BioBridge demonstrates performance comparable to that of mainstream PLMs on multiple protein benchmarks, such as EC and BindingDB. It also achieves results on par with LLMs on general understanding tasks like MMLU and RACE. This showcases its innovative advantage of combining domain-specific adaptability with general-purpose language competency.

Yujia Wang, Jihong Guan, Wengen Li, Shuigeng Zhou, Xuhong Wang• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningOpenBookQA
Accuracy88.4
63
General ReasoningMMLU
Accuracy63.3
15
LocalizationDL Multi PFMBench (test)
Score0.8152
11
LocalizationDL Bin PFMBench (test)
Score0.9269
11
InteractionM. I. Bin. PFMBench (test)
Score76.111
10
InteractionBindingDB
Score0.1715
8
General ReasoningAGIEval
Accuracy39.82
4
General ReasoningRACE
Accuracy84
4
SolubilityDeepSol PFMBench (test)
Score0.8288
3
AnnotationEC PFMBench (test)
Score74.252
2
Showing 10 of 16 rows

Other info

Follow for update