MegaFlow: Zero-Shot Large Displacement Optical Flow
About
Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow, a simple yet powerful model for zero-shot large displacement optical flow. Rather than relying on highly complex, task-specific architectural designs, MegaFlow adapts powerful pre-trained vision priors to produce temporally consistent motion fields. In particular, we formulate flow estimation as a global matching problem by leveraging pre-trained global Vision Transformer features, which naturally capture large displacements. This is followed by a few lightweight iterative refinements to further improve the sub-pixel accuracy. Extensive experiments demonstrate that MegaFlow achieves state-of-the-art zero-shot performance across multiple optical flow benchmarks. Moreover, our model also delivers highly competitive zero-shot performance on long-range point tracking benchmarks, demonstrating its robust transferability and suggesting a unified paradigm for generalizable motion estimation. Our project page is at: https://kristen-z.github.io/projects/megaflow.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Optical Flow | Sintel (train) | AEPE (Clean)0.85 | 200 | |
| Optical Flow Estimation | Sintel Final (test) | EPE2.43 | 133 | |
| Optical Flow Estimation | Sintel clean (test) | EPE0.91 | 120 | |
| Optical Flow Estimation | KITTI 2015 (test) | Fl-all3.94 | 108 | |
| Optical Flow | KITTI (train) | Fl-all0.107 | 84 | |
| Point Tracking | DAVIS TAP-Vid | -- | 52 | |
| Point Tracking | TAP-Vid Kinetics | -- | 48 | |
| Optical Flow | Spring (test) | EPE0.349 | 32 | |
| Point Tracking | RoboTAP | -- | 22 | |
| Point Tracking | TAP-Vid | DAVIS Score77.6 | 15 |