Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

About

Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our method, StableMTL, repurposes image generators for latent regression. Adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks. To encourage inter-task synergy, we introduce a multi-stream model with a task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention, promoting effective cross-task sharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.

Anh-Quan Cao, Ivan Lopes, Raoul de Charette• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes
mIoU55.79
494
Depth EstimationKITTI
RMSE4.3707
156
Depth EstimationDIODE
Delta-1 Accuracy73.72
82
Semantic segmentationNYU v2 (val)
mIoU50.43
75
Depth EstimationNYU v2 (val)--
65
Scene Flow EstimationKITTI
EPE (m)0.2313
64
Surface Normal EstimationDIODE
Mean Angle Error23.27
27
Surface Normal EstimationNYUv2 (val)
mAE21.91
19
Depth EstimationHypersim (test)--
17
Surface Normal EstimationHypersim (test)
Mean Angular Error19.3
9
Showing 10 of 26 rows

Other info

Follow for update