Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
About
Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Entity Linking | FUNSD (test) | F1 Score79.23 | 42 | |
| Visually-rich Document Named Entity Recognition | CORD r (test) | F1 Score91.85 | 16 | |
| Pair Extraction | RFUND-EN (test) | F1 Score50.27 | 16 | |
| Reading Order Prediction | ReadingBank | Avg. Page-level BLEU98.18 | 12 | |
| Visually-rich Document Named Entity Recognition | FUNSD r (test) | F1 Score80.4 | 8 | |
| Segment-level Reading Order Relation Prediction | ROOR EC-FUNSD (val) | F1 Score42.96 | 5 | |
| Reading Order Prediction | ReadingBank (test) | ARD (r=100%)0.29 | 4 |