DiffusionSTR: Diffusion Model for Scene Text Recognition
About
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.
Masato Fujitake• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Text Recognition | SVT (test) | Word Accuracy93.6 | 289 | |
| Scene Text Recognition | IC15 (test) | Word Accuracy86 | 210 | |
| Scene Text Recognition | IC13 (test) | Word Accuracy97.1 | 207 | |
| Scene Text Recognition | CUTE 288 samples (test) | Word Accuracy92.5 | 98 | |
| Scene Text Recognition | IC15 | Accuracy82.2 | 86 | |
| Scene Text Recognition | IC13 | Accuracy97.1 | 66 | |
| Scene Text Recognition | IIIT5K 3,000 samples (test) | Word Accuracy97.3 | 59 | |
| Scene Text Recognition | SVTP | Accuracy89.2 | 52 | |
| Scene Text Recognition | SVTP 645 samples (test) | Word Accuracy89.2 | 48 | |
| Scene Text Recognition | SVT 647 images | Accuracy93.6 | 33 |
Showing 10 of 12 rows