Generative Auto-Bidding with Unified Modeling and Exploration

About

Automated bidding is central to modern digital advertising. Early rule-based methods lacked adaptability, while subsequent Reinforcement Learning approaches modeled bidding as a Markov Decision Process but struggled with long-term dependencies. Recent generative models show promise, yet they lack explicit mechanisms to balance exploration and safety, relying solely on action perturbations or trajectory guidance without a safety fallback. This results in inefficient exploration and elevated financial risk for advertising platforms. To address this gap, we propose GUIDE (Generative Auto-Bidding with Unified Modeling and Exploration), a framework that synergistically integrates directed exploration with a safe fallback mechanism. GUIDE employs a Decision Transformer (DT) to jointly model historical bidding actions and environmental state transitions. A Q-value module guides the DT's exploration via regularization constraints, while an Inverse Dynamics Module (IDM) leverages DT-predicted future states to infer robust, behaviorally consistent actions as a safe policy fallback. The Q-value module then adaptively selects the final action between these two options, balancing exploration and safety. Together, these components form an integrated "explore-safeguard-select" pipeline that unifies efficiency and safety. We conduct extensive experiments on public datasets, in simulated auction environments, and through large-scale online deployment on Taobao, a leading Chinese advertising platform. Results show GUIDE consistently outperforms state-of-the-art baselines across all scenarios. In real-world deployment, GUIDE achieves notable gains: +4.10% ad GMV, +1.40% ad clicks, +1.66% ad cost, and +3.52% ad ROI, demonstrating its effectiveness and strong industrial applicability.

Mingming Zhang, Feiqing Zhuang, Na Li, Shengjie Sun, Xiaowei Chen, Junxiong Zhu, Fei Xiao, Keping Yang, Lixin Zou, Chenliang Li• 2026

Related benchmarks

Task	Dataset	Result
Auto-bidding	TaoBao real-world A/B (test)	ROI3.52	10
Auto-bidding	AuctionNet 50% budget	Score20.3	9
Auto-bidding	AuctionNet (75% budget)	Score29.1	9
Auto-bidding	AuctionNet 100% budget	Score37.6	9
Auto-bidding	AuctionNet 125% budget	Score43.3	9
Auto-bidding	AuctionNet 150% budget	Score48.3	9
Auto-bidding	Simulation environment	Score8.34e+3	8

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord