Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

About

With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.

Fenghe Tang, Wenxin Ma, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Medical Image SegmentationBUSI (test)--
121
Medical Image SegmentationISIC (test)
IoU0.8339
55
Medical Image SegmentationKvasir (test)
F1 Score93.9
18
Medical Image SegmentationTNSCUI (test)
IoU79.93
13
Showing 4 of 4 rows

Other info

Code

Follow for update