Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations

About

Pre-trained Vision Transformers now serve as powerful tools for computer vision. Yet, efficiently adapting them for multiple tasks remains a challenge that arises from the need to modify the rich hidden representations encoded by the learned weight matrices, without inducing interference between tasks. Current parameter-efficient methods like LoRA, which apply low-rank updates, force tasks to compete within constrained subspaces, ultimately degrading performance. We introduce DiTASK a novel Diffeomorphic Multi-Task Fine-Tuning approach that maintains pre-trained representations by preserving weight matrix singular vectors, while enabling task-specific adaptations through neural diffeomorphic transformations of the singular values. By following this approach, DiTASK enables both shared and task-specific feature modulations with minimal added parameters. Our theoretical analysis shows that DITASK achieves full-rank updates during optimization, preserving the geometric structure of pre-trained features, and establishing a new paradigm for efficient multi-task learning (MTL). Our experiments on PASCAL MTL and NYUD show that DiTASK achieves state-of-the-art performance across four dense prediction tasks, using 75% fewer parameters than existing methods. Our code is available [here](https://github.com/ipsitmantri/DiTASK).

Krishna Sri Ipsit Mantri, Carola-Bibiane Sch\"onlieb, Bruno Ribeiro, Chaim Baskin, Moshe Eliasof• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (test)
mIoU56.08
1154
Depth EstimationNYU v2 (test)--
432
Depth EstimationNYU Depth V2
RMSE0.65
209
Semantic segmentationNYUD v2 (test)
mIoU44.01
187
Semantic segmentationNYUD v2
mIoU41.13
125
Multi-task LearningPascal Context
mIoU (Semantic Segmentation)76.23
64
Multi-task LearningPASCAL Context (val)
SemSeg mIoU70.09
24
Monocular Depth EstimationCityscapes (test)
RMSE6.35
18
Multi-task LearningNYUD v2
mIoU (Semantic Segmentation)37.36
9
Surface Normals EstimationNYUD v2
RMSE27.25
6
Showing 10 of 10 rows

Other info

Follow for update