A TALE OF TWO (SIMILAR) CITIES - Inferring City Similarity through Geo-spatial Query Log Analysis
Citations Over TimeTop 10% of 2011 papers
Abstract
Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. We extend traditional search-engine query log analysis, which has primarily concentrated on analyzing either single or small groups of queries or users, to examining the complete query stream of very large groups of users – the inhabitants of 13,377 cities across the United States. Query logs can be a good representation of the interests of the city’s inhabitants and a useful characterization of the city itself. Further, we demonstrate how query logs can be effectively used to gather city-level statistics sufficient for providing insights into the similarities and differences between cities. Cities that are found to be similar through the use of query analysis correspond well to the similar cities as determined through other large-scale and time-consuming direct measurement studies, such as those undertaken by the Census Bureau. 1. CE SUS & QUERY LOGS Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. A variety of sources for demographic and behavioral information exist today. One of the largest-scale efforts to understand people across the United States is conducted every 10 years by the US Census Bureau. This massive operation, which gathers statistics about population, ethnicity and race, is supplemented by smaller surveys, such as the American Community Survey, that gathers a variety of more in-depth information about households. Advertisers often use the high-level information gathered by these surveys to help target their ad campaigns to the most appropriate regions and cities in the US. In contrast to the Census studies, passive studies of search engine query logs have become common since the introduction of search engines and the massive adoption of the Internet to quickly find information (Jansen and Spink, 2006)(Silverstein et. al., 1999). These studies provide the quantitative data to not only improve the search engine’s results, but also to provide a deeper understanding of the user and the user’s interests than the data collected by the Census and similar surveys. The goal of our work is to extend techniques and data sources that have commonly been used for online single-user (or small group) understanding to extremely large groups (up to millions of users) that are usually only taken on by large studies by the Census. We want to determine whether the query stream emanating from groups of users – the inhabitants of 13,377 cities across the United States – is a good representation for the interests of the city’s inhabitants, and therefore a useful characterization of the city itself. Figure 1 shows the geographic distribution of the queries analyzed in this study. Figure 1: Geographic distribution of query samples used
Related Papers
- Study and Two Types of Typical Usage of DataGrid Web Server Control(2005)
- Achieving Parameter of DBSCAN Based on Datagrid(2010)
- Using DataGrid Control to Realize DataBase of Querying in VB6.0(2000)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
- → DETERMINING QUALITY REQUIREMENTS AT THE UNIVERSITIES TO IMPROVE THE QUALITY OF EDUCATION(2018)