A comprehensive comparative study on term weighting schemes for text categorization with support vector machines
2005pp. 1032–1032
Citations Over TimeTop 10% of 2005 papers
Abstract
Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital step in automatic text categorization. In this paper, we conducted comprehensive experiments to compare various term weighting schemes with SVM on two widely-used benchmark data sets. We also presented a new term weighting scheme tf-rf to improve the term's discriminating power. The controlled experimental results showed that this newly proposed tf-rf scheme is significantly better than other widely-used term weighting schemes. Compared with schemes related with tf factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization.
Related Papers
- Study on Improved CHI for feature selection in Chinese text categorization(2011)
- → Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words(2008)8 cited
- Feature Selection in Text Categorization(2004)
- → Realization of Text Categorization for Small-Scaled Dataset(2012)
- Automatic text categorization for patent data(2008)