End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network

About

Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.

Denis Coquenet, Cl\'ement Chatelain, Thierry Paquet• 2020

Related benchmarks

Task	Dataset	Result
Handwritten text recognition	IAM (test)	CER5	102
Handwritten text recognition	RIMES	Character Error Rate (CER)3.04	26
Handwritten text recognition	IAM-A (test)	CER (%)4.45	24
Handwritten text recognition	READ 2016 (test)	CER3.59	23
Handwritten text recognition	IAM Aachen (test)	CER4.45	23
Handwritten text recognition	RIMES (test)	CER1.91	15
Handwritten text recognition	IAM	Character Error Rate (CER)4.32	12
Line-level recognition	Fraktur (test)	CER3.01	11
Line-level recognition	Antiqua (test)	CER1.85	11
Paragraph-level OCR	BnL (test)	CER6.42	7

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord