LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

About

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.

Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei• 2022

Related benchmarks

Task	Dataset	Result
Document Classification	RVL-CDIP (test)	Accuracy95.93	306
Document Visual Question Answering	DocVQA (test)	ANLS83.4	292
Information Extraction	CORD (test)	F1 Score97.46	136
Information Visual Question Answering	InfoVQA (test)	ANLS45.1	130
Entity extraction	FUNSD (test)	Entity F1 Score92.08	104
Visual Question Answering	DocVQA	ANLS83.4	75
Form Understanding	FUNSD (test)	F1 Score92.08	73
Information Extraction	FUNSD (test)	F1 Score92.08	55
Semantic Entity Recognition	CORD	F1 Score97.46	55
Document Visual Question Answering	DocVQA v1.0 (test)	ANLS83.37	49

Showing 10 of 59 rows

Other info

Code

Follow for update

@wizwand_team Discord