Fast and memory efficient mining of frequent closed itemsets
Citations Over TimeTop 1% of 2006 papers
Abstract
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.
Related Papers
- → Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework(2015)28 cited
- → SILVERBACK+: scalable association mining via fast list intersection for columnar social data(2016)4 cited
- A Dynamic Scalable Asynchronous Message Model Based on Distributed Objects(2002)
- A BLEND OF ALGORITHMS RSA AND BIT, ADDITIVE-DIFFERENCEOPERATIONS AND ALGORITHMS IN EL-GAMAL ENCRYPTIONDECRYPTIONIMAGES(2013)
- Partition-based algorithm for mining high utility long itemsets(2007)