word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
About
Deep learning natural language processing models often use vector word embeddings, such as word2vec or GloVe, to represent words. A discrete sequence of words can be much more easily integrated with downstream neural layers if it is represented as a sequence of continuous vectors. Also, semantic relationships between words, learned from a text corpus, can be encoded in the relative configurations of the embedding vectors. However, storing and accessing embedding vectors for all words in a dictionary requires large amount of space, and may stain systems with limited GPU memory. Here, we used approaches inspired by quantum computing to propose two related methods, {\em word2ket} and {\em word2ketXS}, for storing word embedding matrix during training and inference in a highly efficient way. Our approach achieves a hundred-fold or more reduction in the space required to store the embeddings with almost no relative drop in accuracy in practical natural language processing tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | SNLI | Accuracy84.87 | 174 | |
| Machine Translation | En-Es | BLEU39.1 | 10 | |
| Machine Translation | En-It | BLEU32.6 | 10 | |
| Machine Translation | En-Ru | BLEU Score31.5 | 10 | |
| Question Answering | WikiQA | MAP0.6842 | 8 |