Multi-Sense Embeddings per Word
Abstract
Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.
Related Papers
- → Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties(2017)24 cited
- → Improving N-gram language modeling for code-switching speech recognition(2017)16 cited
- Modeling of term-distance and term-occurrence information for improving n-gram language model performance(2013)
- → Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR(2014)2 cited
- → Developing a method to build Japanese speech recognition system based on 3-gram language model expansion with Google database(2013)