Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
About
Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose \textbf{AIGB-Pearl} (\emph{\textbf{P}lanning with \textbf{E}valu\textbf{A}tor via \textbf{RL}}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Auto-bidding | Simulated Offline Advertising System 1.5k budget 30 advertisers | GMV503 | 9 | |
| Auto-bidding | Simulated Offline Advertising System 2.0k Budget 30 Advertisers | GMV521.8 | 9 | |
| Auto-bidding | Simulated Offline Advertising System 2.5k Budget 30 Advertisers | GMV545 | 9 | |
| Auto-bidding | Simulated Offline Advertising System 3.0k Budget 30 Advertisers | GMV574.2 | 9 | |
| Auto-bidding | TaoBao real-world A/B (test) | GMV7.87e+7 | 9 | |
| Auto-bidding | Real-world A/B tests 4k unseen advertisers over 19 days | GMV6.93e+7 | 4 | |
| Auto-bidding | TargetROAS real-world A/B test 300k advertisers (22 days) | GMV8.20e+8 | 2 | |
| Auto-bidding | Simulated First-Price Auction | GMV1.61e+3 | 2 |