Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
About
We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Text Recognition | SVT (test) | Word Accuracy96.3 | 289 | |
| Scene Text Recognition | IIIT5K (test) | Word Accuracy96.8 | 244 | |
| Scene Text Recognition | IC13 (test) | Word Accuracy90 | 207 | |
| Scene Text Recognition | IIIT5K | Accuracy96.8 | 149 | |
| Scene Text Recognition | SVT 647 (test) | Accuracy82.4 | 101 | |
| Text Recognition | Street View Text (SVT) | Accuracy96.3 | 80 | |
| Scene Text Recognition | IC03 | Accuracy97.9 | 67 | |
| Scene Text Recognition | SVT | Accuracy80.7 | 67 | |
| Scene Text Recognition | IC13 | Accuracy90 | 66 | |
| Scene Text Recognition | IC03 (test) | Accuracy97.9 | 63 |