Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

About

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.

Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei• 2022

Related benchmarks

TaskDatasetResultRank
Document ClassificationRVL-CDIP (test)
Accuracy95.93
306
Document Visual Question AnsweringDocVQA (test)
ANLS83.4
192
Information ExtractionCORD (test)
F1 Score97.46
133
Entity extractionFUNSD (test)
Entity F1 Score92.08
104
Information Visual Question AnsweringInfoVQA (test)
ANLS45.1
92
Form UnderstandingFUNSD (test)
F1 Score92.08
73
Information ExtractionFUNSD (test)
F1 Score92.08
55
Semantic Entity RecognitionCORD
F1 Score97.46
55
Document Visual Question AnsweringDocVQA v1.0 (test)
ANLS83.37
49
Entity LinkingFUNSD (test)
F1 Score80.35
42
Showing 10 of 50 rows

Other info

Code

Follow for update