MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

About

Text images contain both visual and linguistic information. However, existing pre-training techniques for text recognition mainly focus on either visual representation learning or linguistic knowledge learning. In this paper, we propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework. We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images, which allows us to learn strong visual representations. In contrast to introducing linguistic knowledge with an additional language model, we directly pre-train the sequence decoder. Specifically, we transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder using a proposed masked image-language modeling scheme. Significantly, the encoder is frozen during the pre-training phase of the sequence decoder. Experimental results demonstrate that our proposed method achieves superior performance on benchmark datasets, including Chinese and English text images.

Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang• 2022

Related benchmarks

Task	Dataset	Result
Scene Text Recognition	SVT (test)	Word Accuracy96.9	289
Scene Text Recognition	IIIT5K (test)	Word Accuracy98	244
Scene Text Recognition	IC15 (test)	Word Accuracy90.1	210
Scene Text Recognition	IC13 (test)	Word Accuracy98.2	207
Scene Text Recognition	SVTP (test)	Word Accuracy94.6	153
Scene Text Recognition	IC13, IC15, IIIT, SVT, SVTP, CUTE80 Average of 6 benchmarks (test)	Average Accuracy95.6	105
Scene Text Recognition	SVT 647 (test)	Accuracy94.7	101
Scene Text Recognition	CUTE 288 samples (test)	Word Accuracy92.7	98
Scene Text Recognition	CUTE	Accuracy96.2	92
Scene Text Recognition	CUTE80 (test)	Accuracy0.958	87

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord