Title :
Dealing with missing values in microarray data
Author :
Mohammadi, Azadeh ; Saraee, Mohammad Hossein
Author_Institution :
Dept. of Electr. & Comput. Eng., Isfahan Univ. of Technol., Isfahan
Abstract :
Gene expression profiling plays an important role in a broad range of areas in biology. The raw gene expression data, may contain missing values. It is an important preprocessing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis. Numerous methods have been developed to deal with missing values. In this paper, a new and robust method based on fuzzy clustering and gene ontology is proposed to estimate missing values in microarray data. In the proposed method, missing values are imputed with values generated from cluster centers. To determine the similar genes in clustering process, we have utilized the biological knowledge obtained from gene ontology as well as gene expression values. We have applied the proposed method on yeast cell cycle data and yeast environmental stress data, with different percentage of missing entries. We compared the estimation accuracy of our method with some other methods. The experimental results indicate that the proposed method outperforms other methods in terms of accuracy.
Keywords :
bioinformatics; data handling; fuzzy set theory; genetics; ontologies (artificial intelligence); pattern clustering; biology computing; fuzzy clustering; gene expression profiling; gene ontology; microarray data missing value; yeast cell cycle data; yeast environmental stress data; Biology; Clustering algorithms; Diseases; Drugs; Fungi; Gene expression; Ontologies; Pharmaceutical technology; Robustness; Stress; fuzzy clustering; gene expression; gene ontoloy; microarray; missing values;
Conference_Titel :
Emerging Technologies, 2008. ICET 2008. 4th International Conference on
Conference_Location :
Rawalpindi
Print_ISBN :
978-1-4244-2210-4
Electronic_ISBN :
978-1-4244-2211-1
DOI :
10.1109/ICET.2008.4777511