Massively Multilingual Word Embeddings
arXiv (Cornell University)2016
Citations Over Time
Abstract
We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.
Related Papers
- → Visual Search and the Collapse of Categorization.(2005)35 cited
- → The neurobiology of categorization(2010)4 cited
- → Two categorization patterns in idiom semantics(2016)1 cited
- On the Reasons for Cognitive Differences During Categorization(2009)
- → Is one object enough? Diagnosticity of single objects for fast scene categorization(2022)