PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

About

Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), that can extract tables directly in their full page or document context. However, progress has been difficult to demonstrate due to a lack of annotated data. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 supports a number of challenging table extraction tasks. Notably, it is the first large-scale benchmark for multi-page table structure recognition. We evaluate several smaller specialized VLMs to establish baseline performance on these tasks. As we show, multi-page table recognition is a key gap in current models' capabilities. Interestingly, we show that introducing an image classifier that predicts when to merge tables across pages can significantly improve performance. Data, code, and models will be released at https://huggingface.co/datasets/kensho/PubTables-v2.

Brandon Smock, Valerie Faucon-Morin, Max Sokolov, Libin Liang, Tayyibah Khanam, Amrit Ramesh, Maury Courtland• 2025

Related benchmarks

Task	Dataset	Result
Table Structure Recognition	PubTables cropped tables collection v2	GriTS Top98.03	6
Page-level Table Extraction	PubTables page-level table extraction v2	GriTS (Top)96.04	5
Cross-page table continuation classification	PubTables v2	--	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord