Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

About

Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), that can extract tables directly in their full page or document context. However, progress has been difficult to demonstrate due to a lack of annotated data. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 supports a number of challenging table extraction tasks. Notably, it is the first large-scale benchmark for multi-page table structure recognition. We evaluate several smaller specialized VLMs to establish baseline performance on these tasks. As we show, multi-page table recognition is a key gap in current models' capabilities. Interestingly, we show that introducing an image classifier that predicts when to merge tables across pages can significantly improve performance. Data, code, and models will be released at https://huggingface.co/datasets/kensho/PubTables-v2.

Brandon Smock, Valerie Faucon-Morin, Max Sokolov, Libin Liang, Tayyibah Khanam, Amrit Ramesh, Maury Courtland• 2025

Related benchmarks

TaskDatasetResultRank
Table Structure RecognitionPubTables cropped tables collection v2
GriTS Top98.03
6
Page-level Table ExtractionPubTables page-level table extraction v2
GriTS (Top)96.04
5
Cross-page table continuation classificationPubTables v2--
2
Showing 3 of 3 rows

Other info

Follow for update