Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DiffusionSTR: Diffusion Model for Scene Text Recognition

About

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.

Masato Fujitake• 2023

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy93.6
289
Scene Text RecognitionIC15 (test)
Word Accuracy86
210
Scene Text RecognitionIC13 (test)
Word Accuracy97.1
207
Scene Text RecognitionCUTE 288 samples (test)
Word Accuracy92.5
98
Scene Text RecognitionIC15
Accuracy82.2
86
Scene Text RecognitionIC13
Accuracy97.1
66
Scene Text RecognitionIIIT5K 3,000 samples (test)
Word Accuracy97.3
59
Scene Text RecognitionSVTP
Accuracy89.2
52
Scene Text RecognitionSVTP 645 samples (test)
Word Accuracy89.2
48
Scene Text RecognitionSVT 647 images
Accuracy93.6
33
Showing 10 of 12 rows

Other info

Follow for update