PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network
About
The reading of arbitrarily-shaped text has received increasing research attention. However, existing text spotters are mostly built on two-stage frameworks or character-based methods, which suffer from either Non-Maximum Suppression (NMS), Region-of-Interest (RoI) operations, or character-level annotations. In this paper, to address the above problems, we propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time. The PGNet is a single-shot text spotter, where the pixel-level character classification map is learned with proposed PG-CTC loss avoiding the usage of character-level annotations. With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency. Additionally, reasoning the relations between each character and its neighbors, a graph refinement module (GRM) is proposed to optimize the coarse recognition and improve the end-to-end performance. Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed. In particular, in Total-Text, it runs at 46.7 FPS, surpassing the previous spotters with a large margin.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Detection | ICDAR 2015 | Precision80.5 | 171 | |
| Scene Text Detection | ICDAR 2015 (test) | F1 Score88.2 | 150 | |
| Text Detection | Total-Text | Recall86.8 | 139 | |
| Text Detection | Total-Text (test) | F-Measure86.1 | 126 | |
| Text Detection | ICDAR 2015 (test) | F1 Score88.2 | 108 | |
| Scene Text Detection | TotalText (test) | Recall86.8 | 106 | |
| Scene Text Spotting | Total-Text (test) | F-measure (None)63.1 | 105 | |
| End-to-End Text Spotting | ICDAR 2015 | Strong Score84.1 | 80 | |
| End-to-End Text Spotting | ICDAR 2015 (test) | -- | 62 | |
| End-to-End Scene Text Spotting | Total-Text | Hmean (None)63.1 | 55 |