Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos

About

Learning robust representations of polyp tracklets is key to enabling multiple AI-assisted colonoscopy applications, from polyp characterization to automated reporting and retrieval. Supervised contrastive learning is an effective approach for learning such representations, but it typically relies on correct positive and negative definitions. Collecting these labels requires linking tracklets that depict the same underlying polyp entity throughout the video, which is costly and demands specialized clinical expertise. In this work, we leverage the sequential workflow of colonoscopy procedures to derive self-supervised associations from temporal structure. Since temporally derived associations are not guaranteed to be correct, we introduce a noise-aware contrastive loss to account for noisy associations. We demonstrate the effectiveness of the learned representations across multiple downstream tasks, including polyp retrieval and re-identification, size estimation, and histology classification. Our method outperforms prior self-supervised and supervised baselines, and matches or exceeds recent foundation models across all tasks, using a lightweight encoder trained on only 27 videos. Code is available at https://github.com/lparolari/ntssl.

Luca Parolari, Pietro Gori, Lamberto Ballan, Carlo Biffi, Loic Le Folgoc• 2026

Related benchmarks

TaskDatasetResultRank
Histology classificationPolypsSet
Accuracy82.38
18
RetrievalREAL-Colon
mAP63.13
18
Re-identificationSUN
AUROC94.17
18
Size EstimationPolypSize
F1 Score70.24
18
Showing 4 of 4 rows

Other info

Follow for update