Prot2Chat: Protein LLM with Early-Fusion of Text, Sequence and Structure

About

Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein Q&A systems. To tackle these issues, we propose the Prot2Chat framework. Results: We modified ProteinMPNN to encode protein sequence and structural information in a unified way. We used a large language model (LLM) to encode questions into vectors and developed a protein-text adapter to compress protein information into virtual tokens based on these vectors, achieving the early fusion of text and protein information. Finally, the same LLM reads the virtual tokens and the questions to generate answers. To optimize training efficiency, we froze the encoder and employed Low-Rank Adaptation (LoRA) techniques for the LLM. Experiments on two datasets show that both automated metrics and expert evaluations demonstrate the superior performance of our model, and zero-shot prediction results highlight its generalization ability. The models and codes are available at https://github.com/ wangzc1233/Prot2Chat. Contact: zqcao@suda.edu.cn or wangzc025@163.com Key words: Protein Q&A, Early-Fusion, LLM

Zhicong Wang, Zicheng Ma, Ziqiang Cao, Changlong Zhou, Jun Zhang, Yiqin Gao• 2025

Related benchmarks

Task	Dataset	Result
Reasoning	OpenBookQA	Accuracy25.6	92
General Reasoning	MMLU	Accuracy24.48	51
General Reasoning	AGIEval	Accuracy16.12	14
Localization	DL Bin PFMBench (test)	Score0.9012	11
Localization	DL Multi PFMBench (test)	Score0.6844	11
Interaction	M. I. Bin. PFMBench (test)	Score71.17	10
Interaction	BindingDB	Score0.1717	8
General Reasoning	RACE	Accuracy21.58	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord