Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models

About

Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweight visual character detection (domain-agnostic) from domain-specific linguistic correction using pretrained sequence models including T5, ByT5, and BART. By training the correctors entirely on synthetic noise, we enable annotation-free domain adaptation without requiring labeled target images. Evaluating across modern clean handwriting, cursive script, and historical documents, we identify a critical "Pareto frontier" in architecture selection: T5-Base excels on modern text with standard vocabulary, whereas ByT5-Base dominates on historical documents by reconstructing archaic spellings at the byte level. Our results demonstrate that this decoupled paradigm matches end-to-end transformer accuracy while reducing compute by approximately 95%, establishing a viable, resource-efficient alternative to monolithic OCR architectures.

Arundhathi Dev, Justin Zhan• 2026

Related benchmarks

TaskDatasetResultRank
Handwriting RecognitionIAM
CER5.18
39
Handwriting RecognitionCVL Modern Clean Handwriting (test)
Word Accuracy78.1
6
Handwriting RecognitionGeorge Washington Historical Handwriting
CER3.7
5
Showing 3 of 3 rows

Other info

Follow for update