Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

About

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1K
Top-1 Acc87
524
Image ClassificationFlowers102
Accuracy89.8
478
Text-to-Image RetrievalFlickr30K
R@183.7
460
Natural Language UnderstandingGLUE
SST-293.4
452
Image-to-Text RetrievalFlickr30K
R@194.1
379
Image ClassificationImageNet
Top-1 Accuracy77.7
324
Image ClassificationImageNet-1k (val)
Top-1 Acc87
287
Text-to-Video RetrievalMSVD
R@152.3
218
Video ClassificationKinetics 400 (val)
Top-1 Acc84.2
204
Image RetrievalFlickr30K
R@175.9
144
Showing 10 of 26 rows

Other info

Code

Follow for update