Attention Concatenation Volume for Accurate and Efficient Stereo Matching

About

Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this paper, we present a novel cost volume construction method which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. To generate reliable attention weights, we propose multi-level adaptive patch matching to improve the distinctiveness of the matching cost at different disparities even for textureless regions. The proposed cost volume is named attention concatenation volume (ACV) which can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy, e.g. using only 1/25 parameters of the aggregation network can achieve higher accuracy for GwcNet. Furthermore, we design a highly accurate network (ACVNet) based on our ACV, which achieves state-of-the-art performance on several benchmarks.

Gangwei Xu, Junda Cheng, Peng Guo, Xin Yang• 2022

Related benchmarks

Task	Dataset	Result
Stereo Matching	KITTI 2015 (test)	D1 Error (Overall)0.0234	245
Stereo Matching	KITTI 2015	D1 Error (All)1.65	142
Stereo Matching	KITTI 2012	Error Rate (3px, All)1.47	108
Stereo Matching	KITTI 2012 (test)	Outlier Rate (3px, Noc)1.13	105
Stereo Matching	Scene Flow (test)	EPE0.48	84
Stereo Matching	Middlebury	Bad Pixel Rate (Thresh 2.0)19.61	84
Stereo Matching	Middlebury (test)	EPE8.24	60
Stereo Matching	KITTI 2015 (all pixels)	D1 Error (Background)1.37	48
Stereo Matching	KITTI Noc 2015	D1 Error (Background)1.26	42
Stereo Matching	Scene Flow	EPE (px)0.48	40

Showing 10 of 26 rows

Other info

Code

Follow for update

@wizwand_team Discord