Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Sparse Mixture-of-Experts are Domain Generalizable Learners

About

Human visual perception can easily generalize to out-of-distributed visual data, which is far beyond the capability of modern machine learning models. Domain generalization (DG) aims to close this gap, with existing DG methods mainly focusing on the loss function design. In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets. We develop a formal framework to characterize a network's robustness to distribution shifts by studying its architecture's alignment with the correlations in the dataset. This analysis guides us to propose a novel DG model built upon vision transformers, namely Generalizable Mixture-of-Experts (GMoE). Extensive experiments on DomainBed demonstrate that GMoE trained with ERM outperforms SOTA DG baselines by a large margin. Moreover, GMoE is complementary to existing DG methods and its performance is substantially improved when trained with DG algorithms.

Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu• 2022

Related benchmarks

TaskDatasetResultRank
Domain GeneralizationVLCS
Accuracy80
238
Domain GeneralizationPACS
Accuracy (Art)89.4
221
Domain GeneralizationOfficeHome
Accuracy73.9
182
Image ClassificationOfficeHome
Average Accuracy74.2
131
Domain GeneralizationDomainBed
Average Accuracy67.9
127
Domain GeneralizationDomainNet
Accuracy48.4
113
Image ClassificationPACS
Accuracy88.1
100
Image ClassificationVLCS
Accuracy80.2
76
Domain GeneralizationOffice-Home
Average Accuracy72.4
63
Image ClassificationDomainNet
Accuracy48.7
63
Showing 10 of 18 rows

Other info

Follow for update