Cluster stability and the use of noise in interpretation of clustering
Citations Over TimeTop 10% of 2005 papers
Abstract
A clustering and ordination algorithm suitable for mining extremely large databases, including those produced by microarray expression studies, is described and analyzed for stability. Data from a yeast cell cycle experiment with 6000 genes and 18 experimental measurements per gene are used to test this algorithm under practical conditions. The process of assigning database objects to an X,Y coordinate, ordination, is shown to be stable with respect to random starting conditions, and with respect to minor perturbations in the starting similarity estimates. Careful analysis of the way clusters typically co-locate, versus the occasional large displacements under different starting conditions are shown to be useful in interpreting the data. This extra stability information is lost when only a single cluster is reported, which is currently the accepted practice. However, it is believed that the approaches presented here should become a standard part of best practices in analyzing computer clustering of large data collections.
Related Papers
- → To cluster, or not to cluster: An analysis of clusterability methods(2018)265 cited
- → A New Cluster Validity Index Based on the Adjustment of Within-Cluster Distance(2020)13 cited
- → Techniques for measuring the stability of clustering: a comparative study(1982)9 cited
- → A New Clustering Algorithm and Related Quantitative Descriptions in Ad Hoc Networks(2014)
- Weighted Cluster Ensemble Based on Co-Occurrence Matrix(2012)