Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models

About

Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweight visual character detection (domain-agnostic) from domain-specific linguistic correction using pretrained sequence models including T5, ByT5, and BART. By training the correctors entirely on synthetic noise, we enable annotation-free domain adaptation without requiring labeled target images. Evaluating across modern clean handwriting, cursive script, and historical documents, we identify a critical "Pareto frontier" in architecture selection: T5-Base excels on modern text with standard vocabulary, whereas ByT5-Base dominates on historical documents by reconstructing archaic spellings at the byte level. Our results demonstrate that this decoupled paradigm matches end-to-end transformer accuracy while reducing compute by approximately 95%, establishing a viable, resource-efficient alternative to monolithic OCR architectures.

Arundhathi Dev, Justin Zhan• 2026

Related benchmarks

Task	Dataset	Result
Handwriting Recognition	IAM	CER5.18	39
Handwriting Recognition	CVL Modern Clean Handwriting (test)	Word Accuracy78.1	6
Handwriting Recognition	George Washington Historical Handwriting	CER3.7	5

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord