SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

About

Reinforcement learning-based preference optimization is increasingly used to align list-wise generative recommenders with complex, multi-objective user feedback, yet existing optimizers such as Gradient-Bounded Policy Optimization (GBPO) exhibit structural limitations in recommendation settings. We identify a Symmetric Conservatism failure mode in which symmetric update bounds suppress learning from rare positive signals (e.g., cold-start items), static negative-sample constraints fail to prevent diversity collapse under rejection-dominated feedback, and group-normalized multi-objective rewards lead to low-resolution training signals. To address these issues, we propose SAGE (Sequence-level Adaptive Gradient Evolution), a unified optimizer designed for list-wise generative recommendation. SAGE introduces sequence-level signal alignment via a geometric-mean importance ratio and a decoupled multi-objective advantage estimator to reduce token-level variance and mitigate reward collapse, together with asymmetric adaptive bounding that applies positive Boost updates to successful slates and an entropy-aware penalty to discourage low-diversity failures. Experiments on Amazon Product Reviews and the large-scale RecIF-Bench demonstrate consistent improvements in top-K accuracy, cold-start recall, and diversity across both Semantic-ID and native-text action spaces, while preserving numerical stability during training. These results suggest that asymmetric, sequence-aware policy optimization provides a principled and effective framework for addressing optimization failures in generative recommendation.

Yu Xie, Xing Kai Ren, Ying Qi, Hu Yao• 2026

Related benchmarks

Task	Dataset	Result
Ad Recommendation	RecIF-Bench Ad Rec	Pass@10.0273	22
Label-Conditional Recommendation	RecIF-Bench Label-Cond. Rec	Pass@320.0574	22
Product Recommendation	RecIF-Bench Product Rec	Pass@12.31	22
Short Video Recommendation	RecIF-Bench Short Video Rec	Pass@15.74	20
Interactive Recommendation	RecIF-Bench Interactive Rec	Pass@113.1	13
Label Prediction	RecIF-Bench	AUC0.7017	9
Sequential Recommendation	Amazon Product Reviews Beauty	Recall@56.83	9
Sequential Recommendation	Amazon Product Reviews Sports	R@53.85	9
Sequential Recommendation	Amazon Product Reviews Toys	R@57.38	9
Generative Recommendation	Amazon Product Reviews Beauty (test)	Entropy@102.551	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord