Weighted word2vec based on the distance of words
Citations Over TimeTop 11% of 2017 papers
Abstract
Word2vec is a novel technique for the study and application of natural language processing(NLP). It trains a word embedding neural network model with a large training corpus. After the model is trained, each word is represented by a vector in the specified vector space. The vectors obtained possess many interesting and useful characteristics that are implicitly embedded with the original words. The idea of word2vec is that there are relations between the words if they appear in the neighborhood. These relations are employed by considering various context windows in training the network model. However, word2vec doesn't consider the influence of distance between the words. It only considers whether or not the words appear in the same context window. We consider that word distances in the context bear certain semantic sense which can be exploited to better train the network model. To formalize the influence of different distances in the context, the fuzzy concept is adopted. Various experiments show that our proposed improvement can result in better language models than Word2Vec.