MASTER: Multi-Aspect Non-local Network for Scene Text Recognition
About
Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-drift problem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. Pytorch code can be found at https://github.com/wenwenyu/MASTER-pytorch, and Tensorflow code can be found at https://github.com/jiangxiluning/MASTER-TF.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Text Recognition | SVT (test) | Word Accuracy90.6 | 289 | |
| Scene Text Recognition | IIIT5K (test) | Word Accuracy95 | 244 | |
| Scene Text Recognition | SVTP (test) | Word Accuracy84.5 | 153 | |
| Scene Text Recognition | CUTE80 (test) | Accuracy0.875 | 87 | |
| Scene Text Recognition | CUTE (test) | Accuracy87.5 | 59 | |
| Scene Text Recognition | IC 2013 (test) | Accuracy95.3 | 51 | |
| Scene Text Recognition | ICDAR 2015 (test) | Accuracy79.4 | 46 | |
| Text Recognition | Chinese text recognition benchmark | Scene Acc62.8 | 33 | |
| Scene Text Recognition | IIIT (test) | Accuracy95 | 30 | |
| Scene Text Recognition | Standard STR Benchmark Suite Average (test) | Average Accuracy89.5 | 14 |