PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis
About
Early detection, accurate segmentation, classification and tracking of polyps during colonoscopy are critical for preventing colorectal cancer. Many existing deep-learning-based methods for analyzing colonoscopic videos either require task-specific fine-tuning, lack tracking capabilities, or rely on domain-specific pre-training. In this paper, we introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations, allowing us to bypass task-specific fine-tuning. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any heuristics. We leverage a robust vision foundation model backbone that is pre-trained unsupervisedly on natural images, thereby removing the need for domain-specific pre-training. Extensive experiments on multiple polyp benchmarks demonstrate that our method significantly outperforms existing state-of-the-art approaches in detection, segmentation, classification, and tracking.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Detection | KUMC | F1 Score91.1 | 20 | |
| Joint Detection and Segmentation | ETIS (Unseen) | Dice Coefficient91.4 | 7 | |
| Joint Detection and Segmentation | CVC-ColonDB (unseen) | Dice83.3 | 7 | |
| Joint Detection and Segmentation | CVC-300 (Unseen) | Dice Coefficient93.2 | 7 | |
| Semantic segmentation | Kvasir-SEG (val) | Dice94.7 | 7 | |
| Semantic segmentation | CVC-ClinicDB (val) | Dice95.6 | 7 | |
| Object Detection | Kvasir-SEG (val) | Precision98 | 5 | |
| Object Detection | CVC-ClinicDB (val) | Precision98.4 | 5 | |
| Polyp Tracking | REAL-colon (subset) | DetA57.7 | 2 |