Title :
Quantitative Association Analysis Using Tree Hierarchies
Author :
Pan, Feng ; Yang, Lynda ; McMillan, Leonard ; Villena, F. ; Threadgill, David ; Wang, Wei
Author_Institution :
Dept. of Comput. Sci., Univ. of North Carolina at Chapel Hill, Chapel Hill, NC
Abstract :
Association analysis arises in many important applications such as bioinformatics and business intelligence. Given a large collection of measurements over a set of samples, association analysis aims to find dependencies of target variables to subsets of measurements. Most previous algorithms adopt a two-stage approach; they first group samples based on the similarity in the subset of measurements, and then they examine the association between these groups and the specified target variables without considering the inter-group similarities or alternative groupings. This can lead to cases where the strength of association depends significantly on arbitrary clustering choices. In this paper, we propose a tree-based method for quantitative association analysis. Tree hierarchies derived from sample similarities represent many possible sample groupings. They also provide a natural way to incorporate domain knowledge such as ontologies and to identify and remove outliers. Given a tree hierarchy, our association analysis evaluates all possible groupings and selects the one with strongest association to the target variable. We introduce an efficient algorithm, TreeQA, to systematically explore the search-space of all possible groupings in a set of input trees, with integrated permutation tests. Experimental results show that TreeQA is able to handlelarge-scale association analysis very efficiently and is more effective and robust in association analysis than previous methods.
Keywords :
ontologies (artificial intelligence); pattern clustering; sensor fusion; set theory; trees (mathematics); TreeQA; association analysis; bioinformatics; business intelligence; intergroup similarities; quantitative association analysis; search-space; tree hierarchies; tree-based method; Analysis of variance; Application software; Bioinformatics; Computer science; Data mining; Discrete wavelet transforms; Filters; Genetics; Ontologies; Testing; Association Analysis; Tree Hierarchies;
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3502-9
DOI :
10.1109/ICDM.2008.100