Stable and expressive recurrent vision models
About
Primate vision depends on recurrent processing for reliable perception. A growing body of literature also suggests that recurrent connections improve the learning efficiency and generalization of vision models on classic computer vision challenges. Why then, are current large-scale challenges dominated by feedforward networks? We posit that the effectiveness of recurrent vision models is bottlenecked by the standard algorithm used for training them, "back-propagation through time" (BPTT), which has O(N) memory-complexity for training an N step model. Thus, recurrent vision model design is bounded by memory constraints, forcing a choice between rivaling the enormous capacity of leading feedforward models or trying to compensate for this deficit through granular and complex dynamics. Here, we develop a new learning algorithm, "contractor recurrent back-propagation" (C-RBP), which alleviates these issues by achieving constant O(1) memory-complexity with steps of recurrent processing. We demonstrate that recurrent vision models trained with C-RBP can detect long-range spatial dependencies in a synthetic contour tracing task that BPTT-trained models cannot. We further show that recurrent vision models trained with C-RBP to solve the large-scale Panoptic Segmentation MS-COCO challenge outperform the leading feedforward approach, with fewer free parameters. C-RBP is a general-purpose learning algorithm for any application that can benefit from expansive recurrent dynamics. Code and data are available at https://github.com/c-rbp.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Maze Solving | Mazes-19 | Accuracy (Mazes-19)2.93 | 7 | |
| Maze Solving | Mazes-25 | Accuracy0.01 | 7 | |
| Visual Reasoning | Mazes Mixed | Accuracy78.33 | 7 | |
| Path Finding | PathFinder-21 | Accuracy50 | 7 | |
| Path Finding | PathFinder 24 | Accuracy50 | 7 | |
| Visual Reasoning | PathFinder Mixed | Accuracy50 | 7 |