Locality-Aware Generalizable Implicit Neural Representation
About
Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities such as specific pixels and rays. To address this issue, we propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. The transformer encoder predicts a set of latent tokens from a data instance to encode local information into each latent token. The locality-aware INR decoder extracts a modulation vector by selectively aggregating the latent tokens via cross-attention for a coordinate input and then predicts the output by progressively decoding with coarse-to-fine modulation through multiple frequency bandwidths. The selective token aggregation and the multi-band feature modulation enable us to learn locality-aware representation in spatial and spectral aspects, respectively. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks such as image generation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | ImageNet 256x256 (test) | FID9.3 | 46 | |
| Image Reconstruction | CelebA 178 x 178 | PSNR50.74 | 9 | |
| Image Reconstruction | Imagenette 178 x 178 | PSNR46.1 | 9 | |
| Image Reconstruction | FFHQ 1024x1024 | PSNR31.94 | 6 | |
| Image Reconstruction | FFHQ 256x256 | PSNR39.88 | 5 | |
| Image Reconstruction | ImageNet 256x256 (test) | -- | 5 | |
| Image Reconstruction | CelebA 178x178 (test) | PSNR50.74 | 4 | |
| Image Reconstruction | ImageNette 178x178 (test) | PSNR46.1 | 4 | |
| Image Reconstruction | FFHQ 178x178 (test) | PSNR43.32 | 3 | |
| Image Reconstruction | FFHQ 512x512 | PSNR35.43 | 3 |