PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis

About

Early detection, accurate segmentation, classification and tracking of polyps during colonoscopy are critical for preventing colorectal cancer. Many existing deep-learning-based methods for analyzing colonoscopic videos either require task-specific fine-tuning, lack tracking capabilities, or rely on domain-specific pre-training. In this paper, we introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations, allowing us to bypass task-specific fine-tuning. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any heuristics. We leverage a robust vision foundation model backbone that is pre-trained unsupervisedly on natural images, thereby removing the need for domain-specific pre-training. Extensive experiments on multiple polyp benchmarks demonstrate that our method significantly outperforms existing state-of-the-art approaches in detection, segmentation, classification, and tracking.

Anwesa Choudhuri, Zhongpai Gao, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu• 2025

Related benchmarks

Task	Dataset	Result
Detection	KUMC	F1 Score91.1	20
Biopsy-site localization	In-house colposcopy dataset (five-fold cross-validation)	Recall65.5	10
Joint Detection and Segmentation	ETIS (Unseen)	Dice Coefficient91.4	7
Joint Detection and Segmentation	CVC-ColonDB (unseen)	Dice83.3	7
Joint Detection and Segmentation	CVC-300 (Unseen)	Dice Coefficient93.2	7
Semantic segmentation	Kvasir-SEG (val)	Dice94.7	7
Semantic segmentation	CVC-ClinicDB (val)	Dice95.6	7
Object Detection	Kvasir-SEG (val)	Precision98	5
Object Detection	CVC-ClinicDB (val)	Precision98.4	5
Polyp Tracking	REAL-colon (subset)	DetA57.7	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord