Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SCATTER: Selective Context Attentional Scene Text Recognizer

About

Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7\% on average.

Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha• 2020

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy89.2
289
Scene Text RecognitionIIIT5K (test)
Word Accuracy92.9
244
Scene Text RecognitionSVTP (test)
Word Accuracy84.5
153
Scene Text RecognitionIIIT5K
Accuracy93.7
149
Scene Text RecognitionCUTE
Accuracy87.5
92
Scene Text RecognitionCUTE80 (test)
Accuracy0.851
87
Scene Text RecognitionSVT
Accuracy92.7
67
Scene Text RecognitionIC03
Accuracy96.3
67
Scene Text RecognitionIC13
Accuracy94.7
66
Scene Text RecognitionIC 2013 (test)
Accuracy93.8
51
Showing 10 of 12 rows

Other info

Follow for update