TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network

About

Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text. Most of existing approaches attempt to solve the problem in two or multiple stages, which is considered to be the bottleneck to optimize the overall performance. To address this issue, we propose an end-to-end trainable network architecture, named TextNet, which is able to simultaneously localize and recognize irregular text from images. Specifically, we develop a scale-aware attention mechanism to learn multi-scale image features as a backbone network, sharing fully convolutional features and computation for localization and recognition. In text detection branch, we directly generate text proposals in quadrangles, covering oriented, perspective and curved text regions. To preserve text features for recognition, we introduce a perspective RoI transform layer, which can align quadrangle proposals into small feature maps. Furthermore, in order to extract effective features for recognition, we propose to encode the aligned RoI features by RNN into context information, combining spatial attention mechanism to generate text sequences. This overall pipeline is capable of handling both regular and irregular cases. Finally, text localization and recognition tasks can be jointly trained in an end-to-end fashion with designed multi-task loss. Experiments on standard benchmarks show that the proposed TextNet can achieve state-of-the-art performance, and outperform existing approaches on irregular datasets by a large margin.

Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, Errui Ding• 2018

Related benchmarks

Task	Dataset	Result
Text Detection	ICDAR 2015	Precision89.4	188
Text Detection	Total-Text	Precision68.21	160
Scene Text Detection	ICDAR 2015 (test)	F1 Score87.37	150
Text Detection	Total-Text (test)	F-Measure63.5	126
Text Detection	ICDAR 2015 (test)	F1 Score87.4	108
Scene Text Detection	TotalText (test)	Recall59.45	106
Scene Text Spotting	Total-Text (test)	F-measure (None)54.02	105
End-to-End Text Spotting	ICDAR 2015	Strong Score78.7	104
Text Detection	ICDAR 2013 (test)	F1 Score91.3	88
End-to-End Scene Text Spotting	Total-Text	Hmean (None)54.02	80

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord