Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Primitive Representation Learning for Scene Text Recognition

About

Scene text recognition is a challenging task due to diverse variations of text instances in natural scene images. Conventional methods based on CNN-RNN-CTC or encoder-decoder with attention mechanism may not fully investigate stable and efficient feature representations for multi-oriented scene texts. In this paper, we propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images. We model elements in feature maps as the nodes of an undirected graph. A pooling aggregator and a weighted aggregator are proposed to learn primitive representations, which are transformed into high-level visual text representations by graph convolutional networks. A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding. Furthermore, by integrating visual text representations into an encoder-decoder model with the 2D attention mechanism, we propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods. Experimental results on both English and Chinese scene text recognition tasks demonstrate that PREN keeps a balance between accuracy and efficiency, while PREN2D achieves state-of-the-art performance.

Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao• 2021

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy94
289
Scene Text RecognitionIIIT5K (test)
Word Accuracy95.6
244
Scene Text RecognitionIC15 (test)
Word Accuracy83
210
Scene Text RecognitionIC13 (test)
Word Accuracy96.4
207
Scene Text RecognitionSVTP (test)
Word Accuracy87.6
153
Scene Text RecognitionIIIT5K
Accuracy95.6
149
Scene Text RecognitionSVT 647 (test)
Accuracy94
101
Scene Text RecognitionCUTE 288 samples (test)
Word Accuracy91.7
98
Scene Text RecognitionCUTE
Accuracy91.7
92
Scene Text RecognitionCUTE80 (test)
Accuracy0.917
87
Showing 10 of 33 rows

Other info

Follow for update