CORE-MTL: Rethinking Gradient Balancing via Causal Orthogonal Representations
About
Multi-task learning (MTL) aims to construct a joint model for multiple tasks by sharing a common representation across domains. To achieve this goal, existing optimization-centric methods either balance task gradients or modify the shared architecture. However, as these approaches remain agnostic to the content of the shared representation, they fail to disentangle task-relevant structure from spurious context, leading to negative transfer and poor generalization. To overcome this limitation, we propose Causal Orthogonal Representations for Multi-Task Learning (CORE-MTL), a causally motivated representation-centric framework that encourages a structured semantic-residual factorization of the shared representation, concentrating task-relevant structure in the semantic stream while relegating nuisance variation to the residual stream. We instantiate this framework in the visual domain by leveraging physical priors for structured scenes and statistical constraints for attributes. Theoretically, our method enjoys a tighter out-of-distribution generalization bound than optimization-centric methods and reduces task gradient interference without explicit gradient projection or reweighting. Empirically, CORE-MTL consistently outperforms existing methods on visual multi-task benchmarks in both in-distribution and out-of-distribution settings. Code is publicly available at https://github.com/Hope-Rita/CORE-MTL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | NYU V2 | -- | 167 | |
| Semantic segmentation | Cityscapes | Mean IoU72.29 | 68 | |
| Depth Estimation | Cityscapes | Abs. Err.0.0123 | 65 | |
| Surface Normal Estimation | NYU V2 | Mean Angular Error22.4927 | 65 | |
| Semantic segmentation | NYU V2 | mIoU56.93 | 30 | |
| Semantic segmentation | Cityscapes-C Robustness benchmark (test) | mIoU61.04 | 11 | |
| Depth Estimation | Cityscapes-C Robustness benchmark (test) | Absolute Error (Abs Err)0.0182 | 11 | |
| Depth Estimation | GTA5 to Cityscapes Sim-to-Real Transfer (Source Target Delta) | Abs Error (Source)0.022 | 11 | |
| Semantic segmentation | GTA5 to Cityscapes Sim-to-Real Transfer (Source Target Delta) | mIoU (Source)65 | 11 |