Tracking Everything Everywhere All at Once

About

We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbed OmniMotion, that allows for accurate, full-length motion estimation of every pixel in a video. OmniMotion represents a video using a quasi-3D canonical volume and performs pixel-wise tracking via bijections between local and canonical space. This representation allows us to ensure global consistency, track through occlusions, and model any combination of camera and object motion. Extensive evaluations on the TAP-Vid benchmark and real-world footage show that our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively. See our project page for more results: http://omnimotion.github.io/

Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, Noah Snavely• 2023

Related benchmarks

Task	Dataset	Result
Point Tracking	TAP-Vid DAVIS (First)	Delta Avg (<c)66.9	76
Point Tracking	TAP-Vid DAVIS (Strided)	Avg Delta Error67.5	33
Point Tracking	TAP-Vid RGB-Stacking (test)	AJ69.5	32
Video Reconstruction	DAVIS	--	29
2D Tracking	BADJA	SegA57.2	20
Video Tracking	BADJA	delta_seg6.9	15
Long-term Point Tracking	TAP-Vid DAVIS 480p (test)	Avg Temporal Error74.1	12
Point Tracking	TAP-Vid DAVIS-480	Avg Displacement Error (x)74.1	9
Dynamic Scene Reconstruction	DAVIS Key Frames	PSNR24.11	8
Video Reconstruction	Tap-Vid DAVIS	PSNR24.11	7

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord