0 citations0 references

Discovering word senses from text

2002pp. 613–619

Citations Over TimeTop 1% of 2002 papers

Abstract

Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.

Related Papers

→ Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation(2020)84 cited
→ Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement(2016)16 cited
→ Visualizing textual travelogue with location-relevant images(2009)15 cited
Clustering documents using tagging communities and semantic proximity(2013)