Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

About

End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feature interaction between the two tasks. In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer encoder with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism to explicitly guide text localization through recognition loss. The straightforward design results in a concise framework that requires neither additional rectification module nor character-level annotation for the arbitrarily-shaped text. Qualitative and quantitative experiments on multi-oriented datasets RoIC13 and ICDAR 2015, arbitrarily-shaped datasets Total-Text and CTW1500, and multi-lingual datasets ReCTS (Chinese) and VinText (Vietnamese) demonstrate SwinTextSpotter significantly outperforms existing methods. Code is available at https://github.com/mxin262/SwinTextSpotter.

Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin• 2022

Related benchmarks

TaskDatasetResultRank
Text DetectionTotal-Text--
139
Text DetectionTotal-Text (test)
F-Measure88
126
Text DetectionICDAR 2015 (test)
F1 Score83.9
108
Scene Text DetectionTotalText (test)--
106
Scene Text SpottingTotal-Text (test)
F-measure (None)74.3
105
End-to-End Text SpottingICDAR 2015
Strong Score83.9
80
Text DetectionCTW1500
F-measure88
70
End-to-End Text SpottingICDAR 2015 (test)
Generic F-measure70.5
62
End-to-End Scene Text SpottingTotal-Text
Hmean (None)74.3
55
Text SpottingICDAR 2015 (test)
Accuracy (Strong Lexicon)77.3
36
Showing 10 of 31 rows

Other info

Code

Follow for update