An Improved K-means Text Clustering Algorithm by Optimizing Initial Cluster Centers
Citations Over TimeTop 10% of 2016 papers
Abstract
K-means clustering algorithm is an influential algorithm in data mining. The traditional K-means algorithm has sensitivity to the initial cluster centers, leading to the result of clustering depends on the initial centers excessively. In order to overcome this shortcoming, this paper proposes an improved K-means text clustering algorithm by optimizing initial cluster centers. The algorithm first calculates the density of each data object in the data set, and then judge which data object is an isolated point. After removing all of isolated points, a set of data objects with high density is obtained. Afterwards, chooses k high density data objects as the initial cluster centers, where the distance between the data objects is the largest. The experimental results show that the improved K-means algorithm can improve the stability and accuracy of text clustering.
Related Papers
- → Affinity propagation clustering algorithm based on large-scale data-set(2018)15 cited
- Semi-supervised Affinity Propagation Clustering(2007)
- → A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams(2009)3 cited
- A high-order affinity propagation clustering algorithm based on tensor distance(2016)