Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

About

In this paper we propose BlockCopy, a scheme that accelerates pretrained frame-based CNNs to process video more efficiently, compared to standard frame-by-frame processing. To this end, a lightweight policy network determines important regions in an image, and operations are applied on selected regions only, using custom block-sparse convolutions. Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency. The execution policy is trained using reinforcement learning in an online fashion without requiring ground truth annotations. Our universal framework is demonstrated on dense prediction tasks such as pedestrian detection, instance segmentation and semantic segmentation, using both state of the art (Center and Scale Predictor, MGAN, SwiftNet) and standard baseline networks (Mask-RCNN, DeepLabV3+). BlockCopy achieves significant FLOPS savings and inference speedup with minimal impact on accuracy.

Thomas Verelst, Tinne Tuytelaars• 2021

Related benchmarks

TaskDatasetResultRank
Video Object DetectionVideo Object Detection (VOD) Benchmark--
13
Showing 1 of 1 rows

Other info

Follow for update