Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

About

Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion and may not capture long-range temporal dependencies in dynamic scenes. To address this gap, we propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks, i.e., sequences of corresponding points across frames. By incorporating these motion cues, the Tracktention Layer enhances temporal alignment and effectively handles complex object motions, maintaining consistent feature representations over time. Our approach is computationally efficient and can be seamlessly integrated into existing models, such as Vision Transformers, with minimal modification. It can be used to upgrade image-only models to state-of-the-art video ones, sometimes outperforming models natively designed for video prediction. We demonstrate this on video depth prediction and video colorization, where models augmented with the Tracktention Layer exhibit significantly improved temporal consistency compared to baselines.

Zihang Lai, Andrea Vedaldi• 2025

Related benchmarks

TaskDatasetResultRank
Depth EstimationSintel ~50 frames
AbsRel0.295
47
Depth EstimationKITTI 110 frames
AbsRel10.4
46
Video Depth EstimationBonn 110 frames
AbsRel6.6
40
Multi-view Depth EstimationETH3D RobustMVD (test)
Rel9.2
10
Multi-view Depth EstimationKITTI RobustMVD (test)
Relative Error6.9
10
Multi-view Depth EstimationScanNet RobustMVD (test)
Rel Error4.5
10
Multi-view Depth EstimationDTU RobustMVD (test)
Relative Error7.3
10
Video ColorizationDAVIS medium frame length
FID24.61
10
Multi-view Depth EstimationTanks and Temples (T&T) RobustMVD (test)
Relative Error (rel)3.2
10
Video ColorizationVidevo long frame length
FID22.78
10
Showing 10 of 11 rows

Other info

Code

Follow for update