Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings
About
Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Chunking | CoNLL 2000 (test) | F1 Score91.98 | 88 | |
| Named Entity Recognition | OntoNotes 4.0 (test) | F1 Score89.52 | 55 | |
| Word Similarity | WS-353 | Spearman Correlation (WS-353)0.789 | 54 | |
| Part-of-Speech Tagging | WSJ (test) | Accuracy96.91 | 51 | |
| Word Similarity | RG-65 | Spearman Correlation0.851 | 35 | |
| Word Similarity | WS-353 REL (test) | Spearman Correlation0.701 | 28 | |
| Word Similarity | SimLex-999 | Spearman Correlation55.2 | 23 | |
| Word Concept Categorization | AP, Battig, ESSLI (test) | AP Score66.4 | 11 | |
| Word Similarity | Rare Word (RW) | Spearman Correlation44 | 7 | |
| Word Analogy | Google Analogy | Accuracy76.3 | 5 |