Beyond the Encoder: Joint Encoder-Decoder Contrastive Pre-Training Improves Dense Prediction

About

Contrastive learning methods in self-supervised settings have primarily focused on pre-training encoders, while decoders are typically introduced and trained separately for downstream dense prediction tasks. However, this conventional approach overlooks the potential benefits of jointly pre-training both encoder and decoder. In this paper, we propose DeCon, an efficient encoder-decoder self-supervised learning (SSL) framework that supports joint contrastive pre-training. We first extend existing SSL architectures to accommodate diverse decoders and their corresponding contrastive losses. Then, we introduce a weighted encoder-decoder contrastive loss with non-competing objectives to enable the joint pre-training of encoder-decoder architectures. By adapting a contrastive SSL framework for dense prediction, DeCon establishes consistent state-of-the-art performance on most of the evaluated tasks when pre-trained on Imagenet-1K, COCO and COCO+. Notably, when pre-training a ResNet-50 encoder on COCO dataset, DeCon improves COCO object detection and instance segmentation compared to the baseline framework by +0.37 AP and +0.32 AP, respectively, and boosts semantic segmentation by +1.42 mIoU on Pascal VOC and by +0.50 mIoU on Cityscapes. These improvements generalize across recent backbones, decoders, datasets, and dense tasks beyond segmentation and object detection, and persist in out-of-domain scenarios, including limited-data settings, demonstrating that joint pre-training significantly enhances representation quality for dense prediction. Code is available at https://github.com/sebquetin/DeCon.git.

S\'ebastien Quetin, Tapotosh Ghosh, Farhad Maleki• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU48.02	3069
Semantic segmentation	ADE20K	mIoU39.25	1028
Object Detection	COCO (val)	--	637
Instance Segmentation	COCO (val)	APmk40.37	485
Object Detection	COCO	AP50 (Box)62.43	237
Semantic segmentation	ISIC (test)	mIoU83.66	59
Panoptic Segmentation	COCO	PQ40.9	31
Human Keypoint Detection	COCO	AP65.88	30
Semantic segmentation	PASCAL VOC 2007 (test)	mIoU75.4	29
Semantic segmentation	VOC (val)	mIoU73.81	25

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord