Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Supervised mid-level features for word image representation

About

This paper addresses the problem of learning word image representations: given the cropped image of a word, we are interested in finding a descriptive, robust, and compact fixed-length representation. Machine learning techniques can then be supplied with these representations to produce models useful for word retrieval or recognition tasks. Although many works have focused on the machine learning aspect once a global representation has been produced, little work has been devoted to the construction of those base image representations: most works use standard coding and aggregation techniques directly on top of standard computer vision features such as SIFT or HOG. We propose to learn local mid-level features suitable for building word image representations. These features are learnt by leveraging character bounding box annotations on a small set of training images. However, contrary to other approaches that use character bounding box information, our approach does not rely on detecting the individual characters explicitly at testing time. Our local mid-level features can then be aggregated to produce a global word image signature. When pairing these features with the recent word attributes framework of Almaz\'an et al., we obtain results comparable with or better than the state-of-the-art on matching and recognition tasks using global descriptors of only 96 dimensions.

Albert Gordo• 2014

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy91.8
289
Scene Text RecognitionIIIT5K (test)
Word Accuracy93.3
244
Scene Text RecognitionIIIT5K
Accuracy93.3
149
Text RecognitionStreet View Text (SVT)
Accuracy91.8
80
Scene Text RecognitionSVT--
67
Scene Text RecognitionIIIT5K
Accuracy (50 Lexicon)93.3
28
Scene Text RecognitionICDAR case-insensitive 2013 (test)
Accuracy93.3
22
Scene Text RecognitionStreet View Text 50-word lexicon (test)
Accuracy0.907
15
Text RecognitionIIIT 5k-word
Accuracy93.3
11
Scene Text RecognitionIIIT 5K-word 1k-word lexicon (test)
Accuracy86.6
6
Showing 10 of 11 rows

Other info

Follow for update