Title : 
Mining Top-K Frequent Closed Patterns from Gene Expression Data
         
        
            Author : 
Shufan Ji ; Xuejiao Wang ; Yi Zong ; Xiaopeng Gao
         
        
            Author_Institution : 
Comput. Collegue, Beihang Univ., Beijing, China
         
        
        
        
        
        
            Abstract : 
Analyzing microarray gene expression data provides biologists deep insights into gene functions and gene regulatory network. In this paper, we propose a novel efficient algorithm FCPminer to mine top-k frequent closed patterns (FCPs) of higher support with length no less than minL from gene expression data. FCPminer employs a prefix fp-tree data structure, with top-down best first search strategy, such that FCPs of adequate length with highest supports are firstly mined. Compared with existing top-k FCP mining algorithms, FCPminer is much more efficient as it avoids expanding nodes with inadequate length (less than minL) or low support (ranked below top-k) during mining process. In addition, FCPminer further improves mining efficiency by employing a hash-based closedness checking method. Experimental results on real biological and synthetic data show that our proposed FCPminer outperforms existing state-of the art algorithms with high efficiency, especially for large and dense datasets.
         
        
            Keywords : 
bioinformatics; data mining; file organisation; genetics; trees (mathematics); FCPminer; hash-based closedness checking method; microarray gene expression data; prefix fp-tree; top-k frequent closed pattern mining; Buildings; Complexity theory; Data mining; Gene expression; Itemsets; Search problems; Space exploration;
         
        
        
        
            Conference_Titel : 
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
         
        
            Conference_Location : 
Shenzhen
         
        
            Print_ISBN : 
978-1-4799-4275-6
         
        
        
            DOI : 
10.1109/ICDMW.2014.61