Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

About

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1K	Top-1 Acc87	600
Text-to-Image Retrieval	Flickr30K	R@183.7	559
Image Classification	Flowers102	Accuracy89.8	558
Natural Language Understanding	GLUE	SST-293.4	551
Image-to-Text Retrieval	Flickr30K	R@194.1	451
Image Classification	ImageNet	Top-1 Accuracy77.7	366
Image Classification	ImageNet-1k (val)	Top-1 Acc87	303
Text-to-Video Retrieval	MSVD	R@152.3	290
Video Classification	Kinetics 400 (val)	Top-1 Acc84.2	204
Image Retrieval	MS-COCO	R@154.1	172

Showing 10 of 26 rows

Other info

Code

Follow for update

@wizwand_team Discord