Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond the Encoder: Joint Encoder-Decoder Contrastive Pre-Training Improves Dense Prediction

About

Contrastive learning methods in self-supervised settings have primarily focused on pre-training encoders, while decoders are typically introduced and trained separately for downstream dense prediction tasks. However, this conventional approach overlooks the potential benefits of jointly pre-training both encoder and decoder. In this paper, we propose DeCon, an efficient encoder-decoder self-supervised learning (SSL) framework that supports joint contrastive pre-training. We first extend existing SSL architectures to accommodate diverse decoders and their corresponding contrastive losses. Then, we introduce a weighted encoder-decoder contrastive loss with non-competing objectives to enable the joint pre-training of encoder-decoder architectures. By adapting a contrastive SSL framework for dense prediction, DeCon establishes consistent state-of-the-art performance on most of the evaluated tasks when pre-trained on Imagenet-1K, COCO and COCO+. Notably, when pre-training a ResNet-50 encoder on COCO dataset, DeCon improves COCO object detection and instance segmentation compared to the baseline framework by +0.37 AP and +0.32 AP, respectively, and boosts semantic segmentation by +1.42 mIoU on Pascal VOC and by +0.50 mIoU on Cityscapes. These improvements generalize across recent backbones, decoders, datasets, and dense tasks beyond segmentation and object detection, and persist in out-of-domain scenarios, including limited-data settings, demonstrating that joint pre-training significantly enhances representation quality for dense prediction. Code is available at https://github.com/sebquetin/DeCon.git.

S\'ebastien Quetin, Tapotosh Ghosh, Farhad Maleki• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU48.02
2731
Semantic segmentationADE20K
mIoU39.25
936
Object DetectionCOCO (val)--
613
Instance SegmentationCOCO (val)
APmk40.37
472
Object DetectionCOCO
AP50 (Box)62.43
190
Semantic segmentationISIC (test)
mIoU83.66
59
Human Keypoint DetectionCOCO
AP65.88
30
Semantic segmentationPASCAL VOC 2007 (test)
mIoU75.4
29
Semantic segmentationVOC (val)
mIoU73.81
25
Panoptic SegmentationCOCO
PQ40.9
23
Showing 10 of 16 rows

Other info

Follow for update