Text Spotting Transformers
About
In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild. TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition. Other than most existing literature, our method is free from Region-of-Interest operations and heuristics-driven post-processing procedures; TESTR is particularly effective when dealing with curved text-boxes where special cares are needed for the adaptation of the traditional bounding-box representations. We show our canonical representation of control points suitable for text instances in both Bezier curve and polygon annotations. In addition, we design a bounding-box guided polygon detection (box-to-polygon) process. Experiments on curved and arbitrarily shaped datasets demonstrate state-of-the-art performances of the proposed TESTR algorithm.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Detection | ICDAR 2015 | Precision90.3 | 171 | |
| Text Detection | Total-Text | Recall81.4 | 139 | |
| Text Detection | Total-Text (test) | F-Measure86.9 | 126 | |
| Text Detection | ICDAR 2015 (test) | F1 Score90 | 108 | |
| Scene Text Detection | TotalText (test) | Recall83.7 | 106 | |
| Scene Text Spotting | Total-Text (test) | F-measure (None)73.3 | 105 | |
| End-to-End Text Spotting | ICDAR 2015 | Strong Score85.2 | 80 | |
| Text Detection | CTW1500 | F-measure87.1 | 70 | |
| Scene Text Detection | Total-Text | Precision92.8 | 63 | |
| End-to-End Text Spotting | ICDAR 2015 (test) | Generic F-measure73.6 | 62 |