Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization
About
In this paper, we provide our milestone ensemble sort work and the first-hand practical experience, Pantheon, which transforms ensemble sorting from a "human-curated art" to a "machine-optimized science". Compared with formulation-based ensemble sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input, which could benefit from the Ranking model to enhance our model complexity. Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an \textbf{iterative Pareto policy optimization} (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge, this paper is the first work to replace the entire formulation-based ensemble sort in industry RecSys, which was fully deployed at Kuaishou live-streaming services, serving 400 Million users daily.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-objective Ranking | Kuaishou-ELive (test) | AUC Sum3.6682 | 12 | |
| Multi-objective Recommendation | Kuaishou E-live (test) | AUC3.67 | 9 | |
| Multi-objective Ranking | TenRec QKVideo 1M (test) | AUC Sum3.6465 | 8 | |
| Multi-objective Ranking | Kuaishou E-live Original 1:10^3 skew (test) | AUC Sum3.67 | 3 | |
| Multi-objective Ranking | Kuaishou E-live 10x Skew 1:10^4 (test) | AUC Sum3.48 | 3 | |
| Multi-objective Ranking | Kuaishou E-live 100x Skew 1:10^5 (test) | AUC Sum3.31 | 3 |