Title :
Missing value estimation for DNA microarray gene expression data with principal curves
Author :
Shi, Jinlong ; Luo, Zhigang
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Computing analysis of gene expression data has been an essential approach for understanding cellular activities and identifying gene function. However, expression profiles generated by the high-throughput microarray experiments often contain missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. So there is a great need for estimating these missing values as accurately as possible. Although there have been many estimation algorithms, but each of them has its flaws. This paper proposes an estimation method for missing values based on principal curve which is a nonlinear generalization of the first linear principal component analysis. Through finding the self-consistent smooth one dimensional curves that pass through the `middle´ of a multidimensional data set, principal curve can integrate the linear and nonlinear relationships between genes, and reveal the distribution of genes. Based on the framework of all the expression profiles, missing values can be estimated more accurately. To assess the performance of the method, comparisons with recently proposed estimation algorithms are carried out on several microarray data sets. The results shows that our method provides a better solution for the estimation of missing values in DNA microarray gene expression data.
Keywords :
DNA; bioinformatics; cellular biophysics; lab-on-a-chip; principal component analysis; DNA microarray gene expression; cellular activity; expression profile; gene function identification; linear principal component analysis; machine learning algorithm; missing value estimation; principal curve; Condition monitoring; DNA computing; Gene expression; Humans; Machine learning algorithms; Matrix decomposition; Multidimensional systems; Pattern analysis; Principal component analysis; Statistical analysis; estimation; microarray; missing value; principal curve;
Conference_Titel :
Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6775-4
DOI :
10.1109/ICBBT.2010.5478964