Title :
High Confidence Rule Mining for Microarray Analysis
Author :
McIntosh, Tara ; Chawla, Sanjay
Author_Institution :
Univ. of Sydney, Sydney
Abstract :
We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.
Keywords :
biology computing; data analysis; data mining; genetics; query processing; tree data structures; MaxConf row-enumeration tree rule mining algorithm; association rule mining; biological analysis; data mining; gene relationship; high-confidence rule mining; infrequent relationship pruning; microarray data analysis; rule extraction; search space reduction; support-based rule mining; support-free algorithm; Data mining; association rules; high confidence rule mining; microarray analysis; Algorithms; Cluster Analysis; Computational Biology; Data Interpretation, Statistical; Gene Expression Profiling; Iron; Models, Genetic; Models, Statistical; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; RNA, Messenger; Reproducibility of Results; Sequence Analysis, DNA;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/tcbb.2007.1050