LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
About
Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Document Classification | RVL-CDIP (test) | Accuracy95.93 | 306 | |
| Document Visual Question Answering | DocVQA (test) | ANLS83.4 | 192 | |
| Information Extraction | CORD (test) | F1 Score97.46 | 133 | |
| Entity extraction | FUNSD (test) | Entity F1 Score92.08 | 104 | |
| Information Visual Question Answering | InfoVQA (test) | ANLS45.1 | 92 | |
| Form Understanding | FUNSD (test) | F1 Score92.08 | 73 | |
| Information Extraction | FUNSD (test) | F1 Score92.08 | 55 | |
| Semantic Entity Recognition | CORD | F1 Score97.46 | 55 | |
| Document Visual Question Answering | DocVQA v1.0 (test) | ANLS83.37 | 49 | |
| Entity Linking | FUNSD (test) | F1 Score80.35 | 42 |