Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

About

Stereo depth estimation relies on optimal correspondence matching between pixels on epipolar lines in the left and right images to infer depth. In this work, we revisit the problem from a sequence-to-sequence correspondence perspective to replace cost volume construction with dense pixel matching using position information and attention. This approach, named STereo TRansformer (STTR), has several advantages: It 1) relaxes the limitation of a fixed disparity range, 2) identifies occluded regions and provides confidence estimates, and 3) imposes uniqueness constraints during the matching process. We report promising results on both synthetic and real-world datasets and demonstrate that STTR generalizes across different domains, even without fine-tuning.

Zhaoshuo Li, Xingtong Liu, Nathan Drenkow, Andy Ding, Francis X. Creighton, Russell H. Taylor, Mathias Unberath• 2020

Related benchmarks

TaskDatasetResultRank
Stereo MatchingKITTI 2015 (test)--
144
Stereo MatchingKITTI 2012 (test)--
76
Stereo MatchingETH3D (test)--
30
Stereo MatchingKITTI 15
D1 Error (%)8.31
27
Stereo MatchingETH3D (train)
Bad 1.0 Rate17.2
23
Stereo MatchingMiddlebury quarter resolution (test)
Threshold Error Rate9.7
19
Stereo MatchingMiddlebury half resolution (test)
Threshold Error Rate15.5
19
Depth EstimationGated Stereo Day 1.0 (test)
RMSE16.77
19
Depth EstimationGated Stereo Night 1.0 (test)
RMSE20.99
19
Stereo MatchingSCARED Set 2 Original 2019 (test)
KF1 Score7.42
12
Showing 10 of 14 rows

Other info

Follow for update