DocumentCode :
3717298
Title :
Parallel information fusion method for microarray data analysis
Author :
Jun Meng;Rui Li;Jing Zhang
Author_Institution :
School of Computer Science and Technology, Dalian University of Technology, Dalian, China
fYear :
2015
Firstpage :
1539
Lastpage :
1544
Abstract :
Classification of microarray data has always been a challenging task due to the enormous number of genes. Finding a small, closely related gene set to accurately classify disease cells is an important research problem. Integrating biological knowledge into genomic analysis to help to improve the interpretation of the results is an effective approach. In this paper, affinity propagation (AP) clustering algorithm is chosen to analyze the impact of the biological similarity on the results. We integrate GO semantic similarity into AP clustering for granule construction. Using MapReduce programming model, a parallel information fusion method is proposed. The process of similarity matrix construction and message passing in AP algorithm is parallelized using MapReduce. Parallel randomly directed hill climb ensemble pruning (RandomDHCEP) method based on MapReduce is introduced for ensemble pruning. An instance analysis represents the process of affinity propagation and ensemble pruning by using iterative MapReduce program. The proposed method can offer good scalability on large data with increasing number of nodes and it can also provide higher classification accuracy rather than using whole gene set for classification.
Keywords :
"Clustering algorithms","Partitioning algorithms","Programming","Data models","Classification algorithms","Semantics","Algorithm design and analysis"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363917
Filename :
7363917
Link To Document :
بازگشت