Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

About

Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.

Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Z. Li• 2022

Related benchmarks

TaskDatasetResultRank
ChunkingCoNLL 2000 (test)
F1 Score91.98
88
Named Entity RecognitionOntoNotes 4.0 (test)
F1 Score89.52
55
Word SimilarityWS-353
Spearman Correlation (WS-353)0.789
54
Part-of-Speech TaggingWSJ (test)
Accuracy96.91
51
Word SimilarityRG-65
Spearman Correlation0.851
35
Word SimilarityWS-353 REL (test)
Spearman Correlation0.701
28
Word SimilaritySimLex-999
Spearman Correlation55.2
23
Word Concept CategorizationAP, Battig, ESSLI (test)
AP Score66.4
11
Word SimilarityRare Word (RW)
Spearman Correlation44
7
Word AnalogyGoogle Analogy
Accuracy76.3
5
Showing 10 of 12 rows

Other info

Code

Follow for update