End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network
About
Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Handwritten text recognition | IAM (test) | CER5 | 102 | |
| Handwritten text recognition | IAM-A (test) | CER (%)4.45 | 24 | |
| Handwritten text recognition | READ 2016 (test) | CER3.59 | 23 | |
| Handwritten text recognition | IAM Aachen (test) | CER4.45 | 23 | |
| Handwritten text recognition | RIMES (test) | CER1.91 | 15 | |
| Handwriting Recognition | IAM page paragraph | CER4.5 | 6 | |
| Handwritten text recognition | IAM-B (test) | CER4.32 | 6 | |
| Handwritten text recognition | READ 2016 | CER4.1 | 6 | |
| Handwritten text recognition | RIMES line level (test) | CER3.04 | 5 | |
| Handwritten Document Recognition | READ Line level 2016 (test) | CER4.1 | 4 |