0 citations0 references

Space Efficient Data Structures for N-gram Retrieval

AIMS Medical Science2017Vol. 4(4), pp. 426–440

Citations Over Time

Abstract

A significant problem in computer science is the management of large data strings and a great number of works dealing with the specific problem has been published in the scientific literature. In this article, we use a technique to store efficiently biological sequences, making use of data structures like suffix trees and inverted files and also employing techniques like n-grams, in order to improve previous constructions. In our attempt, we drastically reduce the space needed to store the inverted indexes, by representing the substrings that appear more frequently in a more compact inverted index. Our technique is based on n-gram indexing, providing us the extra advantage of indexing sequences that cannot be separated in words. Moreover, our technique combines classical one level with two-level n-gram inverted file indexing. Our results suggest that the new proposed algorithm can compress the data more efficiently than previous attempts.

Related Papers

→ Space efficient linear time construction of suffix arrays(2004)184 cited
→ Lempel–Ziv Factorization Using Less Time & Space(2008)73 cited
→ A time and space efficient data structure for string searching on large texts(1996)27 cited
→ A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets(2004)17 cited
Fast Construction of Suffix Arrays for DNA Strings(2007)