Approximate Boyer–Moore String Matching
Citations Over TimeTop 10% of 1993 papers
Abstract
The Boyer–Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. The generalized Boyer–Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time $O(kn({1 / {(m - k) + ({k / c})}}))$, where c is the size of the alphabet. A related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with $ \leqslant k$ differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer–Moore algorithm when $k = 0$.
Related Papers
- → Comparison of Two-Dimensional String Matching Algorithms(2012)9 cited
- → Exact and like string matching algorithm for web and network security(2013)5 cited
- → An improved algorithm for boyer-moore string matching in chinese information processing(2011)5 cited
- → Novel approach for string searching and matching using American standard code for information interchange value(2016)1 cited
- → Effects of Suffix Repetition Rates of a String on the Performance of String Matching Algorithms(2009)