Title :
Combination of KNN-Based Feature Selection and KNNBased Missing-Value Imputation of Microarray Data
Author :
Meesad, Phayung ; Hengpraprohm, Kairung
Author_Institution :
Dept. of Teacher Training in Electr. Eng., King Mongkut ´´s Univ. of Technol., Bangkok
Abstract :
Microarrays are useful biological resource to study living forms at the molecule level. Microarrays usually have only few samples but high dimensionality with many missing values. The consequent downstream analysis becomes less efficiency. This paper proposes a methodology to impute missing values in microarray data. The proposed methodology is a combination of KNN-based feature selection and KNN-based imputation (KNNFS impute). The KNNFS impute comprises of two main ideas: feature selection and estimation of new values. A comparative study of the proposed method with traditional KNN and row average methods has been presented for the estimation of the missing values on three microarray data sets: lung tumor, colon cancer, and ALL-AML leukemia dataset. The best estimation results are measured by the minimum normalized root mean squared error (NRMSE). The results show that the proposed method has powerful estimation ability on the three data sets with smaller NRMSE than the compared methods.
Keywords :
data handling; mean square error methods; medical computing; KNN based missing-value imputation; KNN-based feature selection; microarray data; minimum normalized root mean squared error; row average methods; Cancer; Colon; DNA; Data mining; Educational technology; Gene expression; Image resolution; Information technology; Lung neoplasms; Organisms;
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
DOI :
10.1109/ICICIC.2008.635