Title :
Cross-Mining Binary and Numerical Attributes
Author :
Garriga, Gemma C. ; Heikinheimo, Hannes ; Seppänen, Jouni K.
Author_Institution :
Helsinki Univ. of Technol., Helsinki
Abstract :
We consider the problem of relating itemsets mined on binary attributes of a data set to numerical attributes of the same data. An example is biogeographical data, where the numerical attributes correspond to environmental variables and the binary attributes encode the presence or absence of species in different environments. From the viewpoint of itemset mining, the task is to select a small collection of interesting itemsets using the numerical attributes; from the viewpoint of the numerical attributes, the task is to constrain the search for local patterns (e.g. clusters) using the binary attributes. We give a formal definition of the problem, discuss it theoretically, give a simple constant-factor approximation algorithm, and show by experiments on biogeographical data that the algorithm can capture interesting patterns that would not have been found using either itemset mining or clustering alone.
Keywords :
approximation theory; data mining; binary attributes encode; biogeographical data; constant-factor approximation algorithm; cross-mining binary; data set; itemset mining; Approximation algorithms; Bioinformatics; Birds; Clustering algorithms; Data mining; Demography; Information science; Itemsets; Motion pictures; Temperature;
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3018-5
DOI :
10.1109/ICDM.2007.32