ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
About
Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type and incur large run-time inference costs; and their performance can degrade when evaluated on novel datasets, even when types remain constant. Large language models have exhibited strong zero-shot classification performance on a wide range of tasks and in this paper we explore their use for CTA. We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner. We ablate each component of our method separately, and establish that improvements to context sampling and label remapping provide the most consistent gains. ArcheType establishes a new state-of-the-art performance on zero-shot CTA benchmarks (including three new domain-specific benchmarks which we release along with this paper), and when used in conjunction with classical CTA techniques, it outperforms a SOTA DoDuo model on the fine-tuned SOTAB benchmark. Our code is available at https://github.com/penfever/ArcheType.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Column Type Annotation | Efthymiou Cross-Domain | F1 Score67.7 | 14 | |
| Column Type Annotation | Limaye Cross-Domain | F1 Score78.3 | 14 | |
| Column Type Annotation | T2D Cross-Domain | F1 Score88 | 14 | |
| Semantic Type Annotation | WikiTable | Micro F1 Score76.7 | 12 | |
| Semantic Type Annotation | SOTAB sch-s | Micro-F183 | 12 | |
| Semantic Type Annotation | SOTABsch | Micro-F10.851 | 12 | |
| Semantic Type Annotation | SOTAB dbp | Micro-F183.6 | 12 | |
| Semantic Type Annotation | T2D | Micro-F188 | 12 | |
| Column Type Annotation | SOTABdbp Cross-Ontology | F1 Score48.3 | 10 | |
| Column Type Annotation | SOTABsch Cross-Ontology | F1 Score48.3 | 10 |