DocumentCode :
1784840
Title :
Semi-supervised imputation for microarray missing value estimation
Author :
Hui-Hui Li ; Feng-Feng Shao ; Guo-Zheng Li
Author_Institution :
Dept. of Control Sci. & Eng., Tongji Univ., Shanghai, China
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
297
Lastpage :
300
Abstract :
Data missing is a kind of inevitable phenomenon in gene expression microarray experiments due to many factors. The integrity of the data plays a key role in the performance of the downstream analysis. Therefore, many developments have been achieved in the research on estimating missing values. However, when it comes to missing data with a large missing rate, most current estimation methods cannot obtain a high estimation precision. In this paper, induced by the thought of semi-supervised learning with collaborative training, we propose a new imputation method called COIM (COllaborative IMputation). COIM estimates missing values using collaborative imputation strategy based on Bayesian principal component analysis (BPCA) and local least squares (LLS). It exploits global correlation information and local structure in the missing dataset, by sharing the estimated results with each other between BPCA and LLS. Furthermore, COIM uses tactics of recovering genes that have less missing entries first. Numerical results demonstrate that COIM is superior to the comparative algorithms in terms of normalized root mean square error (NRMSE), especially for the datasets with large missing rates or less complete genes.
Keywords :
Bayes methods; bioinformatics; data analysis; data integrity; genetic algorithms; genetics; learning (artificial intelligence); least mean squares methods; principal component analysis; BPCA; Bayesian principal component analysis; COIM; collaborative imputation strategy; collaborative training; data integrity; data missing; downstream analysis; gene expression microarray experiments; gene recovery; global correlation information; high-estimation precision; local least squares; microarray missing value estimation; normalized root mean square error; semisupervised imputation; semisupervised learning; Bayes methods; Bioinformatics; Collaboration; Correlation; Estimation; Gene expression; Least squares approximations; Microarray gene expression data; large missing rate; missing value imputation; semi-supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999172
Filename :
6999172
Link To Document :
بازگشت