Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

About

Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion and may not capture long-range temporal dependencies in dynamic scenes. To address this gap, we propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks, i.e., sequences of corresponding points across frames. By incorporating these motion cues, the Tracktention Layer enhances temporal alignment and effectively handles complex object motions, maintaining consistent feature representations over time. Our approach is computationally efficient and can be seamlessly integrated into existing models, such as Vision Transformers, with minimal modification. It can be used to upgrade image-only models to state-of-the-art video ones, sometimes outperforming models natively designed for video prediction. We demonstrate this on video depth prediction and video colorization, where models augmented with the Tracktention Layer exhibit significantly improved temporal consistency compared to baselines.

Zihang Lai, Andrea Vedaldi• 2025

Related benchmarks

TaskDatasetResultRank
Depth EstimationSintel ~50 frames
AbsRel0.295
14
Depth EstimationKITTI 110 frames
AbsRel10.4
14
Multi-view Depth EstimationETH3D RobustMVD (test)
Rel9.2
10
Multi-view Depth EstimationKITTI RobustMVD (test)
Relative Error6.9
10
Multi-view Depth EstimationScanNet RobustMVD (test)
Rel Error4.5
10
Multi-view Depth EstimationDTU RobustMVD (test)
Relative Error7.3
10
Video ColorizationDAVIS medium frame length
FID24.61
10
Multi-view Depth EstimationTanks and Temples (T&T) RobustMVD (test)
Relative Error (rel)3.2
10
Video ColorizationVidevo long frame length
FID22.78
10
Video Depth EstimationScannet 90 frames
AbsRel0.087
8
Showing 10 of 11 rows

Other info

Code

Follow for update