Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

About

Artistic text recognition is an extremely challenging task with a wide range of applications. However, current scene text recognition methods mainly focus on irregular text while have not explored artistic text specifically. The challenges of artistic text recognition include the various appearance with special-designed fonts and effects, the complex connections and overlaps between characters, and the severe interference from background patterns. To alleviate these problems, we propose to recognize the artistic text at three levels. Firstly, corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape. In this way, the discreteness of the corner points cuts off the connection between characters, and the sparsity of them improves the robustness for background interference. Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification. Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism. Besides, we provide an artistic text dataset to benchmark the performance. Experimental results verify the significant superiority of our proposed method on artistic text recognition and also achieve state-of-the-art performance on several blurred and perspective datasets.

Xudong Xie, Ling Fu, Zhifei Zhang, Zhaowen Wang, Xiang Bai• 2022

Related benchmarks

TaskDatasetResultRank
Text RecognitionIIIT, SVT, IC13, IC15, SVTP, CT
IIIT Acc95.9
37
Scene Text RecognitionICDAR 2015
Accuracy (No Lexicon)86.3
35
Scene Text RecognitionICDAR 2013
Accuracy96.4
27
Scene Text RecognitionWordArt
Accuracy70.8
24
Scene Text RecognitionSVT Perspective (645)
Accuracy91.5
22
Scene Text RecognitionStreet View Text 647
Accuracy94.6
22
Scene Text RecognitionIIIT5K-Words (3000)
Accuracy95.9
22
Scene Text RecognitionCUTE80 288
Accuracy92
20
Showing 8 of 8 rows

Other info

Follow for update