CloSpan: Mining: Closed Sequential Patterns in Large Datasets
Citations Over TimeTop 1% of 2003 papers
Abstract
Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a min-sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent subsequences for long patterns, which is prohibitively expensive in both time and space. In this paper, we propose an alternative but equally powerful solution: instead of mining the complete set of frequent subsequences, we mine frequent closed subsequences only, i.e., those containing no super-sequence with the same support (i.e., occurrence frequency). By exploring novel global optimization techniques, an efficient algorithm, called CloSpan (Closed Sequential pattern mining) is developed, which outperforms the previous work by one order of magnitude. Moreover, CloSpan can mine really long sequences, which, to the best of our knowledge, is un-minable by previous algorithms. Finally, CloSpan produces a significantly less number of discovered sequences than the traditional (i.e., full-set) methods while preserving the same expressive power since the whole set of frequent subsequences, together with their supports, can be derived easily from our mining results.
Related Papers
- → Knowledge discovery of weighted RFM sequential patterns with multi time interval from customer sequence database(2014)2 cited
- → A General Construction of ZCZ Sequence Set with Large Family Size and Long Period(2011)1 cited
- Algorithm that constructs two sequence-set betting strategies that predict all compressible sequences(2012)
- Study on the Constructions of ZCZ Sequence Pair Set(2010)
- Novel Approach to Mine Sequential Frequent Pattern(2016)