TURL: Table Understanding through Representation Learning
About
Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task-specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. Specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. We systematically evaluate TURL with a benchmark consisting of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Measure Type Prediction | AnaMeta | Accuracy66.92 | 14 | |
| Aggregation Classification | AnaMeta | Accuracy91.15 | 10 | |
| Common Breakdown Identification | AnaMeta | HR@162.38 | 10 | |
| Dimension Type Classification | AnaMeta | Accuracy96.55 | 10 | |
| Common Measure Identification | AnaMeta | HR@170.37 | 10 | |
| Measure Identification | AnaMeta | Accuracy97.65 | 10 | |
| Measure Pair Identification | AnaMeta | Accuracy68.29 | 10 | |
| Natural Key Identification | AnaMeta | Accuracy92.65 | 10 | |
| Column Type Annotation | Freebase (test) | F1 Score94.75 | 7 | |
| Entity Linking | WikiGS (test) | F1 Score67 | 7 |