Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DTrOCR: Decoder-only Transformer for Optical Character Recognition

About

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

Masato Fujitake• 2023

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy98.9
289
Scene Text RecognitionIC15 (test)
Word Accuracy93.5
210
Scene Text RecognitionIC13 (test)
Word Accuracy99.4
207
Scene Text RecognitionCUTE 288 samples (test)
Word Accuracy99.1
98
Scene Text RecognitionIIIT5K 3,000 samples (test)
Word Accuracy99.6
59
Scene Text RecognitionSVTP 645 samples (test)
Word Accuracy98.6
48
Text RecognitionChinese text recognition benchmark
Scene Acc87.4
33
Handwriting RecognitionIAM
CER2.38
32
Handwritten text recognitionIAM-A (test)
CER (%)2.38
24
Handwritten text recognitionIAM Aachen (test)
CER2.38
23
Showing 10 of 11 rows

Other info

Code

Follow for update