Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

N-gram Injection into Transformers for Dynamic Language Model Adaptation in Handwritten Text Recognition

About

Transformer-based encoder-decoder networks have recently achieved impressive results in handwritten text recognition, partly thanks to their auto-regressive decoder which implicitly learns a language model. However, such networks suffer from a large performance drop when evaluated on a target corpus whose language distribution is shifted from the source text seen during training. To retain recognition accuracy despite this language shift, we propose an external n-gram injection (NGI) for dynamic adaptation of the network's language modeling at inference time. Our method allows switching to an n-gram language model estimated on a corpus close to the target distribution, therefore mitigating bias without any extra training on target image-text pairs. We opt for an early injection of the n-gram into the transformer decoder so that the network learns to fully leverage text-only data at the low additional cost of n-gram inference. Experiments on three handwritten datasets demonstrate that the proposed NGI significantly reduces the performance gap between source and target corpora.

Florent Meyer, Laurent Guichard, Yann Soullard, Denis Coquenet, Guillaume Gravier, Bertrand Co\"uasnon• 2026

Related benchmarks

TaskDatasetResultRank
Handwritten text recognitionIAM (Lexicon split (Target))
CER15.8
8
Handwritten text recognitionRIMES (k-means split (Target))
CER29.9
4
Handwriting RecognitionN2S (target)
CER6.3
4
Handwriting RecognitionN2S (source)
CER2.5
4
Handwritten text recognitionIAM Lexicon split (Source)
CER6.9
4
Handwritten text recognitionIAM (k-means split (Source))
CER7.6
4
Handwritten text recognitionRIMES Lexicon (Source)
CER4.7
4
Handwritten text recognitionRIMES Lexicon split (Target)
CER18.1
4
Handwritten text recognitionRIMES Source (k-means split)
CER4.3
4
Showing 9 of 9 rows

Other info

Follow for update