Fast Algorithms for Convolutional Neural Networks

About

Deep convolutional neural networks take GPU days of compute time to train on large data sets. Pedestrian detection for self driving cars requires very low latency. Image recognition for mobile phones is constrained by limited processing resources. The success of convolutional neural networks in these situations is limited by how fast we can compute them. Conventional FFT based convolution is fast for large filters, but state of the art convolutional neural networks use small, 3x3 filters. We introduce a new class of fast algorithms for convolutional neural networks using Winograd's minimal filtering algorithms. The algorithms compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes. We benchmark a GPU implementation of our algorithm with the VGG network and show state of the art throughput at batch sizes from 1 to 64.

Andrew Lavin, Scott Gray• 2015

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (val)	Top-1 Accuracy94.5	377
Image Classification	ImageNet V2	Std Accuracy82.3	18
Conditioning Analysis of Winograd Transforms	Winograd F(m, 3) Tiles	Kappa2(V)1.97e+5	7
Image Classification	ImageNet 640 samples (val)	Kappa^21.97e+5	6
INT8 Winograd Convolution	Winograd Convolution Random Pairs Uniform [-1, 1]	Standard INT8 Relative L2 Error0.013	4
Numerical Stability Optimization	F(4,3)	Kappa2 Score42.5	3
Numerical Stability Optimization	F(6,3)	Kappa2 Score2.08e+3	3
Numerical Optimization of Vandermonde Arithmetic	F(2,3) tile 3x3 kernels 4 interpolation points	Kappa 2 Score3.2	2
Numerical Optimization of Vandermonde Arithmetic	F(4,3) tile 3x3 kernels 6 interpolation points	Kappa 242.5	2
Numerical Optimization of Vandermonde Arithmetic	F(6,3) tile 3x3 kernels 8 interpolation points	Kappa 22.08e+3	2

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord