Towards a universal wordnet by learning from combined evidence
Citations Over TimeTop 10% of 2009 papers
Abstract
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.
Related Papers
- → DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary(2009)68 cited
- → Potentials and challenges of WordNet-based pedagogical lexicography: The Transpoetika Dictionary(2012)6 cited
- → iChi(2009)
- → G-WordNet: Moving WordNet 3.0 and Its Resources to a Graph Database(2017)