Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

About

Reinforcement learning-based preference optimization is increasingly used to align list-wise generative recommenders with complex, multi-objective user feedback, yet existing optimizers such as Gradient-Bounded Policy Optimization (GBPO) exhibit structural limitations in recommendation settings. We identify a Symmetric Conservatism failure mode in which symmetric update bounds suppress learning from rare positive signals (e.g., cold-start items), static negative-sample constraints fail to prevent diversity collapse under rejection-dominated feedback, and group-normalized multi-objective rewards lead to low-resolution training signals. To address these issues, we propose SAGE (Sequence-level Adaptive Gradient Evolution), a unified optimizer designed for list-wise generative recommendation. SAGE introduces sequence-level signal alignment via a geometric-mean importance ratio and a decoupled multi-objective advantage estimator to reduce token-level variance and mitigate reward collapse, together with asymmetric adaptive bounding that applies positive Boost updates to successful slates and an entropy-aware penalty to discourage low-diversity failures. Experiments on Amazon Product Reviews and the large-scale RecIF-Bench demonstrate consistent improvements in top-K accuracy, cold-start recall, and diversity across both Semantic-ID and native-text action spaces, while preserving numerical stability during training. These results suggest that asymmetric, sequence-aware policy optimization provides a principled and effective framework for addressing optimization failures in generative recommendation.

Yu Xie, Xing Kai Ren, Ying Qi, Hu Yao• 2026

Related benchmarks

TaskDatasetResultRank
Ad RecommendationRecIF-Bench Ad Rec
Pass@10.0273
20
Label-Conditional RecommendationRecIF-Bench Label-Cond. Rec
Pass@320.0574
20
Product RecommendationRecIF-Bench Product Rec
Pass@12.31
20
Short Video RecommendationRecIF-Bench Short Video Rec
Pass@15.74
20
Interactive RecommendationRecIF-Bench Interactive Rec
Pass@113.1
11
Label PredictionRecIF-Bench
AUC0.7017
9
Sequential RecommendationAmazon Product Reviews Beauty
Recall@56.83
9
Sequential RecommendationAmazon Product Reviews Sports
R@53.85
9
Sequential RecommendationAmazon Product Reviews Toys
R@57.38
9
Generative RecommendationAmazon Product Reviews Beauty (test)
Entropy@102.551
3
Showing 10 of 12 rows

Other info

Follow for update