Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
About
Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation (SANA, PixArt-$\alpha$), Dual-Solver improves FID and CLIP scores in the low-NFE regime ($3 \le$ NFE $\le 9$) across backbones.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | MSCOCO 2014 | FID (30k)18.52 | 44 | |
| Image Generation | ImageNet 50k samples | FID2.32 | 42 | |
| Image Generation | ImageNet 50k samples (test) | FID2.6 | 35 | |
| Text-to-Image Generation | MSCOCO 30k samples 2014 (val) | FID21.96 | 35 |