A Study of Chinese Document Representation and Classification with Word2vec
2016pp. 298–302
Citations Over TimeTop 10% of 2016 papers
Abstract
Word2vec is a neural network language model which can convert words and phrases into a high-quality distributed vector (called word embedding) with semantic word relationships, so it offers a unique perspective to the text classification and other natural language processing (NLP) tasks. In this paper, we propose to combine improved tfidf algorithm and word embedding as a way to represent documents and conduct text classification experiments on the Sogou Chinese classification corpus. Our results show that the combination of word embedding and improved tf-idf algorithm can outperform either individually.
Related Papers
- → A Study of Chinese Document Representation and Classification with Word2vec(2016)22 cited
- → Research on application of article recommendation algorithm based on Word2Vec and Tfidf(2022)15 cited
- → Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field(2021)6 cited
- → A Comparison Study on Legal Document Classification Using Deep Neural Networks(2019)2 cited
- → Deep Learning for Sentiment Analysis in Indonesian Novel Review(2020)1 cited