Deep Feature Flow for Video Recognition
About
Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (val) | mIoU69.2 | 572 | |
| Video Object Detection | ImageNet VID (val) | mAP (%)73.1 | 341 | |
| Video Semantic Segmentation | Cityscapes (val) | mIoU70.1 | 91 | |
| Semantic segmentation | CamVid | mIoU66 | 61 | |
| Semantic Video Segmentation | Cityscapes (test) | mIoU68.7 | 24 | |
| Breast Lesion Detection | BLUVD-186 (test) | AP25.8 | 12 | |
| Semantic segmentation | UAVid 8 semantic classes (val) | mIoU77.2 | 12 | |
| Semantic segmentation | RuralScapes 12 semantic classes (val) | mIoU62.66 | 12 |