Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An end-to-end TextSpotter with Explicit Alignment and Attention

About

Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot. Our main contributions are three-fold: 1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is the key to boost the per- formance; 2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; 3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which is end-to-end trainable. This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances. Our model achieves impressive results in end-to-end recognition on the ICDAR2015 dataset, significantly advancing most recent results, with improvements of F-measure from (0.54, 0.51, 0.47) to (0.82, 0.77, 0.63), by using a strong, weak and generic lexicon respectively. Thanks to joint training, our method can also serve as a good detec- tor by achieving a new state-of-the-art detection performance on two datasets.

Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun• 2018

Related benchmarks

TaskDatasetResultRank
Text DetectionICDAR 2015
Precision87
171
Scene Text DetectionICDAR 2015 (test)
F1 Score87
150
Text DetectionICDAR 2015 (test)
F1 Score87
108
Text DetectionICDAR 2013 (test)
F1 Score90
88
Text DetectionMSRA-TD500
Precision71
84
End-to-End Text SpottingICDAR 2015
Strong Score82
80
Scene Text DetectionMSRA-TD500 (test)
Precision71
65
Word SpottingICDAR 2015
Strong Score85
42
Text SpottingICDAR 2015 (test)
Accuracy (Strong Lexicon)82
36
End-to-End Text SpottingICDAR 2013 (test)--
25
Showing 10 of 18 rows

Other info

Follow for update