Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

About

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt to develop various region of interest (RoI) operations to concatenate the detection part and the sequence recognition part into a two-stage text spotting framework. However, in such framework, the recognition part is highly sensitive to the detected results (e.g.), the compactness of text contours). To address this problem, in this paper, we propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation. Concretely, a position-aware mask attention module is developed to generate attention weights on each text instance and its characters. It allows different text instances in an image to be allocated on different feature map channels which are further grouped as a batch of instance features. Finally, a lightweight sequence decoder is applied to generate the character sequences. It is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting and can be trained end-to-end with only coarse position information (e.g.), rectangular bounding box) and text annotations. Experimental results show that the proposed method achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks, i.e., ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.

Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, Fei Wu• 2020

Related benchmarks

TaskDatasetResultRank
Text DetectionICDAR 2015
Precision89
171
Scene Text SpottingTotal-Text (test)
F-measure (None)72.9
105
End-to-End Text SpottingICDAR 2015
Strong Score85.4
80
End-to-End Text SpottingICDAR 2015 (test)
Generic F-measure73.9
62
End-to-End Scene Text SpottingTotal-Text
Hmean (None)72.9
55
Word SpottingICDAR 2015 (test)
F-score (Strong lexicon)86.4
36
Text SpottingICDAR 2015 (test)
Accuracy (Strong Lexicon)81.8
36
End-to-End Text SpottingSCUT-CTW1500 (test)
F-Measure (None Config)58.9
34
End-to-End Text SpottingICDAR 2013 (test)
Score S93.4
25
Text SpottingCTW1500
E2E Score (None)58.9
24
Showing 10 of 23 rows

Other info

Follow for update