Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

About

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona• 2022

Related benchmarks

Task	Dataset	Result
Text Detection	ICDAR 2015 (test)	F1 Score85.2	108
Scene Text Spotting	Total-Text (test)	F-measure (None)78.2	105
End-to-End Text Spotting	ICDAR 2015	Strong Score85.2	104
End-to-End Scene Text Spotting	Total-Text	Hmean (None)78.2	80
End-to-End Text Spotting	ICDAR 2015 (test)	Generic F-measure77.4	62
Word Spotting	ICDAR 2015	Strong Score85	42
Word Spotting	ICDAR 2015 (test)	F-score (Strong lexicon)85	36
Text Spotting	ICDAR 2015 (test)	Accuracy (Strong Lexicon)81.7	36
Scene Text Spotting	Total-Text	F-measure (None)78.2	23
End-to-end Recognition	Total-Text	--	22

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord