Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TiCLS : Tightly Coupled Language Text Spotter

About

Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.

Leeje Jang, Yijun Lin, Yao-Yi Chiang, Jerod Weinman• 2026

Related benchmarks

TaskDatasetResultRank
Scene Text SpottingTotal-Text (test)--
105
End-to-End Text SpottingICDAR 2015 (test)
Generic F-measure81.9
62
End-to-End Text SpottingSCUT-CTW1500 (test)
F-Measure (None Config)66.2
34
Showing 3 of 3 rows

Other info

Follow for update