RaPD: Resolution-Agnostic Pixel Diffusion via Semantics-Enriched Implicit Representations
About
Natural images are continuous, yet most generative models synthesize them on discrete grids, limiting resolution-flexible generation. Continuous neural fields enable resolution-free rendering, but prior methods introduce continuity only at the decoding stage as an interpolation module, leaving the generative latent space discretized and reconstruction-oriented. We propose RaPD (Resolution-agnostic Pixel Diffusion), which performs diffusion in a continuous Neural Image Field (NIF) latent space. RaPD bridges this reconstruction-generation gap with Semantic Representation Guidance for generation-aware latent learning and a Coordinate-Queried Attention Renderer for coordinate-conditioned, scale-aware rendering. A single denoised latent can be rendered at arbitrary resolutions by changing only the query coordinates, keeping diffusion cost fixed. Experiments demonstrate superior generation quality and resolution scalability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | GenEval 1.0 (test) | Overall Score85 | 130 | |
| Text-to-Image Generation | DPG-Bench (test) | Overall Fidelity81.6 | 68 |