A Latent Variable Model for Geographic Lexical Variation
Figshare2018pp. 1277–1287
Citations Over TimeTop 1% of 2018 papers
Abstract
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports” or “entertainment” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models.
Related Papers
- → Find me if you can(2010)706 cited
- Discriminating Gender on Twitter(2011)
- → That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships(2021)228 cited
- → Where Is This Tweet From? Inferring Home Locations of Twitter Users(2021)194 cited
- Geolocation Prediction in Social Media Data by Finding Location Indicative Words(2012)