Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription

About

Handwriting text recognition (HTR) remains a challenging task. Existing approaches require fine-tuning on labeled data, which is impractical to obtain for real-world problems, or rely on zero-shot tools such as OCR engines and multi-modal LLMs (MLLMs). MLLMs have shown promise both as end-to-end transcribers and as OCR post-processors, but to date there is little empirical research evaluating different MLLM prompting strategies for HTR, particularly for the case of multi-page documents. Most handwritten documents are multi-page, and share context such as semantic content and handwriting style across pages, yet MLLMs are typically used for transcription at the page level, meaning they throw away this shared context. They are also typically used as either text-only post-processors or image-only OCR alternatives, rather than leveraging multiple modes. This paper investigates a suite of methods combining OCR, LLM post-processing and MLLM end-to-end transcription, for the task of zero-shot multi-page handwritten document transcription. We introduce a benchmark for this task from existing single-page datasets, including a new dataset, Malvern-Hills. Finally, we introduce OCR+PAGE-1 and OCR+PAGE-N, prompting strategies for multi-page transcription that outperform existing methods by sharing content across pages while minimizing prompt complexity.

Benjamin Gutteridge, Matthew Thomas Jackson, Toni Kukurin, Xiaowen Dong• 2025

Related benchmarks

TaskDatasetResultRank
Document Text TranscriptionMalvern-Hills-5+
CER5.43
25
Handwriting RecognitionIAM-5-Random
CER0.7
22
Handwritten text recognitionIAM-5
CER0.47
22
Handwriting RecognitionIAM
CER0.63
20
Handwriting TranscriptionIAM
CER0.63
20
Handwriting TranscriptionBentham dataset
CER0.0848
19
Handwritten text recognitionMalvern-Hills
CER5.83
19
Optical Character RecognitionBentham (test)
CER8.48
19
TranscriptionMalvern-Hills dataset (test)
CER (%)5.83
19
TranscriptionMalvern-Hills-10+ 1.0 (test)
CER4.76
15
Showing 10 of 11 rows

Other info

Follow for update