Title :
Finding Robust Itemsets under Subsampling
Author :
Tatti, Nikolaj ; Moerchen, Fabian
Author_Institution :
Univ. of Antwerp, Antwerp, Belgium
Abstract :
Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an item set such as closed ness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: closed, free, non-derivable and totally shattered item sets, demonstrating how we can compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and the patterns reported are simply a subset of all patterns with this property as opposed to approximate patterns for which the property does not really hold. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust item sets. We further derive a parameter-free technique for ranking item sets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.
Keywords :
data mining; frequent pattern mining; noise model; null hypothesis; parameter free technique; pattern explosion; pattern reduction techniques; robust itemsets; subsampling; top-k approaches; Data mining; Itemsets; Polynomials; Random variables; Robustness; TV; Vectors; closed itemsets; free itemsets; non-derivable itemsets; pattern reduction; robust itemsets; totally shattered itemsets;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.69