The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images
About
A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Table Detection | Marmot English | Recall97 | 12 | |
| Table Detection | UNLV (test) | Recall95 | 12 | |
| Table Detection | Marmot Chinese | Recall0.96 | 12 | |
| Table Detection | ICDAR 60 images 2013 (test) | Recall100 | 10 | |
| Table Detection | ICDAR archive 2019 | Recall95 | 10 | |
| Table Detection | TableBank LaTeX 1K (test) | Recall99 | 9 |