What is Glove:
Glove is an unsupervised learning algorithm to create vector representations for similar words that can be clustered together and unrelated words repel. Training is performed on aggregated global word-word co-occurrence statistics from a given corpus, not only relying on the local contextual information about the words.
What is Word2Vec:
Word2Vec has been developed by Tomas Mikolov in 2013 at Google and is one of the best possible methods to build word embeddings using shallow neural networks. It is capable of capturing the context of a word in a given document, find semantic and syntactic similarity, relation with other words, etc.
We often need to convert pre-trained Glove Vectors into Word2Vec embedding format so that it can bee feed to any larger neural network e.g. LSTM, GRU, etc.
Here we will explain, how to convert pre-trained Glove vectors into Word2Vec format using Gensim implementation for the Word2Vec algorithm.
# import required methods from gensim package from gensim.test.utils import get_tmpfile from gensim.models import KeyedVectors from gensim.scripts.glove2word2vec import glove2word2vec # create temp file and save converted embedding into it target_file = get_tmpfile('word2vec.6B.300d.txt') glove2word2vec('glove.6B.300d.txt', target_file) # load the converted embedding into memory model = KeyedVectors.load_word2vec_format(target_file) # save as binary data model.save_word2vec_format('word2vec.6B.300d.bin.gz', binary=True)
You can download pre-trained Glove vectors from here: https://nlp.stanford.edu/projects/glove/