DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
About
Pre-trained Vision Transformers now serve as powerful tools for computer vision. Yet, efficiently adapting them for multiple tasks remains a challenge that arises from the need to modify the rich hidden representations encoded by the learned weight matrices, without inducing interference between tasks. Current parameter-efficient methods like LoRA, which apply low-rank updates, force tasks to compete within constrained subspaces, ultimately degrading performance. We introduce DiTASK a novel Diffeomorphic Multi-Task Fine-Tuning approach that maintains pre-trained representations by preserving weight matrix singular vectors, while enabling task-specific adaptations through neural diffeomorphic transformations of the singular values. By following this approach, DiTASK enables both shared and task-specific feature modulations with minimal added parameters. Our theoretical analysis shows that DITASK achieves full-rank updates during optimization, preserving the geometric structure of pre-trained features, and establishing a new paradigm for efficient multi-task learning (MTL). Our experiments on PASCAL MTL and NYUD show that DiTASK achieves state-of-the-art performance across four dense prediction tasks, using 75% fewer parameters than existing methods. Our code is available [here](https://github.com/ipsitmantri/DiTASK).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU56.08 | 1154 | |
| Depth Estimation | NYU v2 (test) | -- | 432 | |
| Depth Estimation | NYU Depth V2 | RMSE0.65 | 209 | |
| Semantic segmentation | NYUD v2 (test) | mIoU44.01 | 187 | |
| Semantic segmentation | NYUD v2 | mIoU41.13 | 125 | |
| Multi-task Learning | Pascal Context | mIoU (Semantic Segmentation)76.23 | 64 | |
| Multi-task Learning | PASCAL Context (val) | SemSeg mIoU70.09 | 24 | |
| Monocular Depth Estimation | Cityscapes (test) | RMSE6.35 | 18 | |
| Multi-task Learning | NYUD v2 | mIoU (Semantic Segmentation)37.36 | 9 | |
| Surface Normals Estimation | NYUD v2 | RMSE27.25 | 6 |