That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets
Citations Over TimeTop 10% of 2015 papers
Abstract
We propose a novel data augmentation approach to enhance computational behavioral analysis using social media text. In particular, we collect a Twitter corpus of the descriptions of annoying behaviors using the #petpeeve hashtags. In the qualitative analysis, we study the language use in these tweets, with a special focus on the fine-grained categories and the geographic variation of the language. In quantitative analysis, we show that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging large lexical embeddings to create additional training instances significantly improves the lexical model; and incorporating frame-semantic embedding achieves the best overall performance. * We understand that many people find long titles annoying, so we intentionally use a very long one to help people understand what "pet peeve" means.
Related Papers
- → Visual Search and the Collapse of Categorization.(2005)35 cited
- → The neurobiology of categorization(2010)4 cited
- → Two categorization patterns in idiom semantics(2016)1 cited
- On the Reasons for Cognitive Differences During Categorization(2009)
- → Is one object enough? Diagnosticity of single objects for fast scene categorization(2022)