Improving retrieval of short texts through document expansion
Citations Over TimeTop 1% of 2012 papers
Abstract
Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
Related Papers
- → Efficient index for retrieving top-k most frequent documents(2010)26 cited
- → Improved TFIDF weighting techniques in document Retrieval(2018)13 cited
- → Document Retrieval Using Deep Learning(2020)8 cited
- → General query expansion techniques for spoken document retrieval(1999)24 cited
- Document Retrieval Based on Knowledge Acquisition and Merging: A Methodology(2012)