Non-local Neural Networks

About

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He• 2017

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU45.8	3069
Object Detection	COCO 2017 (val)	AP38	2843
Instance Segmentation	COCO 2017 (val)	--	1275
Semantic segmentation	Cityscapes (test)	mIoU82.5	1252
Image Classification	ImageNet (val)	Top-1 Acc22.91	1206
Classification	ImageNet-1K 1.0 (val)	Top-1 Accuracy (%)78.95	1171
Object Detection	PASCAL VOC 2007 (test)	--	844
Semantic segmentation	Cityscapes (val)	mIoU78.57	527
Action Recognition	Kinetics-400	Top-1 Acc77.7	498
Action Recognition	UCF101	Accuracy95.6	433

Showing 10 of 106 rows

...

Other info

Code

Follow for update

@wizwand_team Discord