Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

About

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.

Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai• 2018

Related benchmarks

Task	Dataset	Result
Scene Text Recognition	SVT (test)	Word Accuracy90.6	289
Scene Text Recognition	IIIT5K (test)	Word Accuracy93.9	244
Scene Text Recognition	IC15 (test)	Word Accuracy77.3	210
Scene Text Recognition	IC13 (test)	Word Accuracy95.3	207
Text Detection	ICDAR 2015	--	188
Text Detection	Total-Text	Precision81.8	160
Scene Text Recognition	SVTP (test)	Word Accuracy82.2	153
Text Detection	Total-Text (test)	F-Measure85.2	126
Text Detection	ICDAR 2015 (test)	F1 Score87	108
Scene Text Detection	TotalText (test)	Recall0.55	106

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord