0 citations0 references

A Study of Chinese Document Representation and Classification with Word2vec

2016pp. 298–302

Citations Over TimeTop 10% of 2016 papers

Abstract

Word2vec is a neural network language model which can convert words and phrases into a high-quality distributed vector (called word embedding) with semantic word relationships, so it offers a unique perspective to the text classification and other natural language processing (NLP) tasks. In this paper, we propose to combine improved tfidf algorithm and word embedding as a way to represent documents and conduct text classification experiments on the Sogou Chinese classification corpus. Our results show that the combination of word embedding and improved tf-idf algorithm can outperform either individually.

Citations Over TimeTop 10% of 2016 papers

Abstract

Related Papers