Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

About

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at \url{https://aka.ms/layoutlm}.

Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou• 2019

Related benchmarks

TaskDatasetResultRank
Document ClassificationRVL-CDIP (test)
Accuracy95.64
306
Document Visual Question AnsweringDocVQA (test)
ANLS72.59
192
Information ExtractionCORD (test)
F1 Score96.26
133
Entity extractionFUNSD (test)
Entity F1 Score88.41
104
Form UnderstandingFUNSD (test)
F1 Score79.27
73
Information ExtractionSROIE (test)
F1 Score95.24
58
Information ExtractionFUNSD (test)
F1 Score79.27
55
Semantic Entity RecognitionCORD
F1 Score94.93
55
Document Question AnsweringDocVQA
ANLS72.59
52
Entity LinkingFUNSD (test)
F1 Score45.86
42
Showing 10 of 47 rows

Other info

Code

Follow for update