Discovering word senses from text
Citations Over TimeTop 1% of 2002 papers
Abstract
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.
Related Papers
- → Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation(2020)84 cited
- → Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement(2016)16 cited
- → Visualizing textual travelogue with location-relevant images(2009)15 cited
- Clustering documents using tagging communities and semantic proximity(2013)