Synthetic Data for Text Localisation in Natural Images
About
In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Detection | ICDAR 2013 (test) | F1 Score83 | 88 | |
| Text Localization | ICDAR 2013 (test) | Recall76 | 28 | |
| End-to-End Text Spotting | ICDAR 2013 (test) | -- | 25 | |
| End-to-End Text Spotting | ICDAR 2011 (test) | F-measure84.3 | 12 | |
| End-to-end Recognition | ICDAR 2013 | Strong F-Measure85 | 8 | |
| End-to-End Text Spotting | Street View Text (SVT) | Max F-measure55.7 | 7 | |
| End-to-End Text Spotting | Street View Text SVT-50 Constrained lexicon | Maximum F1-Score68 | 7 | |
| Word Spotting | ICDAR 2013 (test) | Generic Metric (G)84.7 | 6 |