Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

About

Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach, combining line segmentation followed by text line recognition. For the first time, we propose an end-to-end segmentation-free architecture for the task of handwritten document recognition: the Document Attention Network. In addition to text recognition, the model is trained to label text parts using begin and end tags in an XML-like fashion. This model is made up of an FCN encoder for feature extraction and a stack of transformer decoder layers for a recurrent token-by-token prediction process. It takes whole text documents as input and sequentially outputs characters, as well as logical layout tokens. Contrary to the existing segmentation-based approaches, the model is trained without using any segmentation label. We achieve competitive results on the READ 2016 dataset at page level, as well as double-page level with a CER of 3.43% and 3.70%, respectively. We also provide results for the RIMES 2009 dataset at page level, reaching 4.54% of CER. We provide all source code and pre-trained model weights at https://github.com/FactoDeepLearning/DAN.

Denis Coquenet, Cl\'ement Chatelain, Thierry Paquet• 2022

Related benchmarks

TaskDatasetResultRank
Handwriting RecognitionIAM
CER4.54
39
Handwritten text recognitionRIMES
Character Error Rate (CER)2.63
26
Handwritten text recognitionREAD 2016 (test)
CER4.1
23
Optical Character RecognitionIAM HW, EN
CER4.3
17
Line-level recognitionAntiqua (test)
CER1.83
11
Line-level recognitionFraktur (test)
CER3.03
11
Handwritten text recognitionIAM (Lexicon split (Target))
CER15
8
Paragraph-level OCRBnL (test)
CER5.24
7
Handwritten text recognitionREAD 2016
CER3.43
6
Handwritten text recognitionRIMES 2009
CER0.0454
5
Showing 10 of 31 rows

Other info

Code

Follow for update