Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robust Scene Text Recognition with Automatic Rectification

About

Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai• 2016

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy96.1
289
Scene Text RecognitionIIIT5K (test)
Word Accuracy96.5
244
Scene Text RecognitionIC15 (test)
Word Accuracy89.8
210
Scene Text RecognitionIC13 (test)
Word Accuracy97.6
207
Scene Text RecognitionSVTP (test)
Word Accuracy71.8
153
Scene Text RecognitionIIIT5K
Accuracy96.2
149
Scene Text RecognitionSVT 647 (test)
Accuracy97
101
Scene Text RecognitionCUTE 288 samples (test)
Word Accuracy97.7
98
Scene Text RecognitionCUTE
Accuracy59.2
92
Scene Text RecognitionCUTE80 (test)
Accuracy0.592
87
Showing 10 of 65 rows

Other info

Follow for update