Compression of inverted indexes For fast query evaluation
Citations Over TimeTop 1% of 2002 papers
Abstract
Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.In experiments on large collections of data, we show two surprising results: use of simple byte-aligned codes halves the query evaluation time compared to the most compact Golomb-Rice bitwise compression schemes; and, even when an index fits entirely in memory, byte-aligned codes result in faster query evaluation than does an uncompressed index, emphasising that the cost of transferring data from memory to the CPU cache is less for an appropriately compressed index than for an uncompressed index. Moreover, byte-aligned schemes have only a modest space overhead: the most compact schemes result in indexes that are around 10% of the size of the collection, while a byte-aligned scheme is around 13%. We conclude that fast byte-aligned codes should be used to store integers in inverted lists.
Related Papers
- Hybrid bitvector index compression(2007)
- → Compressing Inverted Index Using Optimal FastPFOR(2015)2 cited
- → Fast Pattern-Matching via k-bit Filtering Based Text Decomposition(2010)1 cited
- For Review Only Fast Pattern Matching via k-bit Filtering Based Text Decomposition(2010)
- An image-encrypting algorithm based on bit operation(2003)