Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

About

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona• 2022

Related benchmarks

TaskDatasetResultRank
Text DetectionICDAR 2015 (test)
F1 Score85.2
108
Scene Text SpottingTotal-Text (test)
F-measure (None)78.2
105
End-to-End Text SpottingICDAR 2015
Strong Score85.2
80
End-to-End Text SpottingICDAR 2015 (test)
Generic F-measure77.4
62
End-to-End Scene Text SpottingTotal-Text
Hmean (None)78.2
55
Word SpottingICDAR 2015
Strong Score85
42
Word SpottingICDAR 2015 (test)
F-score (Strong lexicon)85
36
Text SpottingICDAR 2015 (test)
Accuracy (Strong Lexicon)81.7
36
Scene Text SpottingTotal-Text
F-measure (None)78.2
23
End-to-end RecognitionTotal-Text--
22
Showing 10 of 23 rows

Other info

Follow for update