Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou

About

In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by only one task, which indicates that shared-experts lost their ability but degenerated into some specific-experts. (3) Expert Underfitting: In our services, we have dozens of behavior tasks that need to be predicted, but we find that some data-sparse prediction tasks tend to ignore their specific-experts and assign large weights to shared-experts. The reason might be that the shared-experts can perceive more gradient updates and knowledge from dense tasks, while specific-experts easily fall into underfitting due to their sparse behaviors. Motivated by those observations, we propose HoME to achieve a simple, efficient and balanced MoE system for multi-task learning.

Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, Guorui Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Multi-task recommendation (Effective-view)KuaiRand-1K (offline)
AUC0.7876
14
Multi-task RecommendationTenrec QK-video
Click AUC84.86
9
Where predictionIntTravel
HR@163.56
9
How predictionIntTravel
Acc67.38
9
Via predictionIntTravel
HR@163.65
9
When predictionIntTravel
Accuracy83.28
9
Multi-task recommendation (Like)KuaiRand-1K (offline)
AUC0.8891
7
Multi-task recommendation (Comment)KuaiRand-1K (offline)
AUC0.7994
7
Multi-task recommendation (Like)Kuaishou Industrial Dataset (offline)
AUC0.965
7
Multi-task recommendation (Click)Kuaishou Industrial Dataset (offline)
AUC73.22
7
Showing 10 of 12 rows

Other info

Follow for update