Robust Scene Text Recognition with Automatic Rectification

About

Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai• 2016

Related benchmarks

Task	Dataset	Result
Scene Text Recognition	SVT (test)	Word Accuracy96.1	289
Scene Text Recognition	IIIT5K (test)	Word Accuracy96.5	244
Scene Text Recognition	IC15 (test)	Word Accuracy89.8	210
Scene Text Recognition	IC13 (test)	Word Accuracy97.6	207
Scene Text Recognition	IIIT5K	Accuracy96.2	161
Scene Text Recognition	SVTP (test)	Word Accuracy71.8	153
Scene Text Recognition	SVT 647 (test)	Accuracy97	101
Scene Text Recognition	CUTE 288 samples (test)	Word Accuracy97.7	98
Scene Text Recognition	CUTE	Accuracy59.2	92
Scene Text Recognition	CUTE80 (test)	Accuracy0.592	87

Showing 10 of 65 rows

Other info

Follow for update

@wizwand_team Discord