Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

About

Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present ZeNO (Zeroth-order Noise Optimization), a gradient-free framework that formulates noise optimization as a path-integral control problem, estimable from zeroth-order reward evaluations alone. When instantiated with an Ornstein--Uhlenbeck reference process, the update connects to Langevin dynamics implicitly targeting a reward-tilted distribution. ZeNO enables effective inference-time scaling and demonstrates strong performance across diverse generators and reward functions, including a protein structure generation task where backpropagation is infeasible.

Jeongsol Kim, Hongeun Kim, Jian Wang, Jong Chul Ye• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGeneral Prompts
Aesthetic Score6.31
15
Protein Structure GenerationProtein backbones
Fraction within 2Å90
7
Text-to-Image GenerationGenEval counting and position
Count Score96.3
6
Text-to-Image GenerationCUB 500 prompts
Aesthetic Score5.63
5
Showing 4 of 4 rows

Other info

Follow for update