MATE: Multi-view Attention for Table Transformer Efficiency
About
This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables. MATE uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. This architecture scales linearly with respect to speed and memory, and can handle documents containing more than 8000 tokens with current accelerators. MATE also has a more appropriate inductive bias for tabular data, and sets a new state-of-the-art for three table reasoning datasets. For HybridQA (Chen et al., 2020b), a dataset that involves large documents containing tables, we improve the best prior result by 19 points.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Table Fact Verification | TabFact (test) | Accuracy81.4 | 98 | |
| Table Question Answering | WikiTQ (test) | Accuracy51.5 | 92 | |
| Table Question Answering | WikiTableQuestions (test) | -- | 86 | |
| Sequential Question Answering | SQA (test) | Accuracy (All)71.7 | 33 | |
| Question Answering | HybridQA (test) | EM (Total)62.8 | 23 | |
| Binary Classification | TabFact (test) | Accuracy81.4 | 18 | |
| Question Answering | HybridQA (dev) | EM (Total)63.4 | 17 | |
| Table Question Answering | SQA (test) | Accuracy (All)71.7 | 11 | |
| Text-to-SQL | WikiSQL (test) | -- | 8 | |
| Table Reasoning | Synthetic dataset | Accuracy79.2 | 6 |