Title :
Geometrically Inspired Itemset Mining
Author :
Verhein, Florian ; Chawla, Sanjay
Author_Institution :
Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW
Abstract :
In our geometric view, an itemset is a vector (itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For frequent itemset mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting item- sets. FIM experiments show that it outperforms FP-growth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments).
Keywords :
data mining; vectors; FIM; algebraic aggregation operator; frequent itemset mining; geometrically inspired itemset mining; linear transformations; nonlinear transformations; Association rules; Australia Council; Data mining; Extraterrestrial measurements; Information technology; Itemsets; Performance evaluation; Size measurement; Space technology; Vectors;
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2701-7
DOI :
10.1109/ICDM.2006.75