0 citations0 references

Dictionary matching and indexing with errors and don't cares

2004pp. 91–100

Citations Over TimeTop 1% of 2004 papers

Richard Cole, Lee-Ad Gottlieb, Moshe Lewenstein

Abstract

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.

Related Papers

→ Algorithms for Finding Patterns in Strings(1990)249 cited
Algorithms for finding patterns in strings(1990)
A Fast String-matching Algorithm(2008)
Progression of String Matching Practices in Web Mining – A Survey(2014)
Fast parallel algorithms for approximate string matching(1992)