Auto-regressive transformation for image alignment
About
Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations, often resulting in suboptimal accuracy. Robustness to these challenges can be improved through iterative refinement of the transform field while focusing on critical regions in multi-scale image representations. We thus propose Auto-Regressive Transformation (ART), a novel method that iteratively estimates the coarse-to-fine transformations through an auto-regressive pipeline. Leveraging hierarchical multi-scale features, our network refines the transform field parameters using randomly sampled points at each scale. By incorporating guidance from the cross-attention layer, the model focuses on critical regions, ensuring accurate alignment even in challenging, feature-limited conditions. Extensive experiments demonstrate that ART significantly outperforms state-of-the-art methods on planar images and achieves comparable performance on 3D scene images, establishing it as a powerful and versatile solution for precise image alignment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Relative Pose Estimation | MegaDepth 1500 | -- | 151 | |
| Homography Estimation | HPatches | -- | 55 | |
| Retinal Image Alignment | FIRE | Acceptable Success Rate99.25 | 48 | |
| Retinal Image Alignment | KBSMC | Acceptable Rate64.71 | 35 | |
| Retinal Image Alignment | FLORI21 | Acceptable Rate100 | 35 | |
| Two-view transformation estimation | ScanNet 1500 | mAUC51.1 | 6 | |
| 2D geometric transformation | GoogleEarth Scene-LR | ACE0.17 | 5 | |
| 2D geometric transformation | GoogleMap Scene-LR | Average Corner Error0.19 | 5 | |
| 2D geometric transformation | MSCOCO Scene-LR | ACE0.05 | 5 |