TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
About
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a fundamental problem in multi-task learning. Recent state-of-the-art models consider directly decoding task-specific features from one shared task-generic feature (e.g., feature from a backbone layer), and utilize carefully designed decoders to produce multi-task features. However, as the input feature is fully shared and each task decoder also shares decoding parameters for different input samples, it leads to a static feature decoding process, producing less discriminative task-specific representations. To tackle this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts model that enables learning multiple representative task-generic feature spaces and decoding task-specific features in a dynamic manner. Specifically, TaskExpert introduces a set of expert networks to decompose the backbone feature into several representative task-generic features. Then, the task-specific features are decoded by using dynamic task-specific gating networks operating on the decomposed task-generic features. Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding. Extensive experiments demonstrate that our TaskExpert clearly outperforms previous best-performing methods on all 9 metrics of two competitive multi-task learning benchmarks for visual scene understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made publicly available at https://github.com/prismformore/Multi-Task-Transformer
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Surface Normal Estimation | NYU v2 (test) | -- | 206 | |
| Depth Estimation | NYU Depth V2 | RMSE0.5157 | 177 | |
| Semantic segmentation | NYUD v2 | mIoU55.35 | 96 | |
| Saliency Detection | Pascal Context (test) | maxF84.87 | 57 | |
| Surface Normal Estimation | Pascal Context (test) | mErr13.56 | 50 | |
| Multi-task Learning | Pascal Context | mIoU (Semantic Segmentation)75.04 | 47 | |
| Boundary Detection | Pascal Context (test) | ODSF73.3 | 34 | |
| Human Part Parsing | Pascal Context (test) | mIoU69.42 | 20 | |
| Boundary Detection | NYUD v2 | ODS F-measure78.4 | 17 | |
| Boundary Detection | NYUD2 | ODS Fmax78.4 | 15 |