Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

About

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary variation of text appearances in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed which employs a novel line-fitting transformation to estimate the pose of text lines in scenes. In addition, an iterative rectification pipeline is developed where scene text distortions are corrected iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

Fangneng Zhan, Shijian Lu• 2018

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy90.2
289
Scene Text RecognitionIIIT5K (test)
Word Accuracy93.3
244
Scene Text RecognitionIC15 (test)
Word Accuracy76.9
210
Scene Text RecognitionIC13 (test)
Word Accuracy91.3
207
Scene Text RecognitionSVTP (test)
Word Accuracy79.6
153
Scene Text RecognitionIIIT5K
Accuracy93.3
149
Scene Text RecognitionCUTE
Accuracy83.3
92
Scene Text RecognitionCUTE80 (test)
Accuracy0.833
87
Scene Text RecognitionIC15
Accuracy76.9
86
Scene Text RecognitionSVT
Accuracy90.2
67
Showing 10 of 25 rows

Other info

Follow for update