All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media
2017pp. 2264–2274
Citations Over Time
Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Abhipsa Basu, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee
Abstract
In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman correlation coefficient values, our methods perform more than two times better (nearly 0.62) in predicting the borrowing likeliness compared to the best performing baseline (nearly 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88 percent of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.
Related Papers
- → A Dataset and Classifier for Recognizing Social Media English(2017)25 cited
- → All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media(2017)19 cited
- → Query Optimization: A Solution for Low Recall Problem in Hindi Language Information Retrieval(2012)6 cited
- → Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media(2017)3 cited
- Ranking German Texts by Comprehensibility for Foreign Document Retrieval(2011)