Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

About

Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE*, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods. Code is available at https://github.com/clovaai/bros.

Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park• 2021

Related benchmarks

TaskDatasetResultRank
Document ClassificationRVL-CDIP (test)
Accuracy95.58
306
Information ExtractionCORD (test)
F1 Score97.4
133
Entity extractionFUNSD (test)
Entity F1 Score84.52
104
Form UnderstandingFUNSD (test)
F1 Score84.52
73
Information ExtractionSROIE (test)
F1 Score96.62
58
Semantic Entity RecognitionCORD
F1 Score97.28
55
Information ExtractionFUNSD (test)
F1 Score84.52
55
Entity LinkingFUNSD (test)
F1 Score77.01
42
Entity recognitionCORD official (test)
F1 Score97.4
37
Semantic Entity RecognitionFUNSD (test)
F1 Score84.5
37
Showing 10 of 24 rows

Other info

Code

Follow for update