DocumentCode :
2770255
Title :
A comparative study of missing value estimation methods: Which method performs better?
Author :
Eng Aik Lim ; Zainuddin, Zarita
Author_Institution :
Insitute of Eng. Mathematic, Univ. Malaysia Perlis, Kuala Perlis
fYear :
2008
fDate :
1-3 Dec. 2008
Firstpage :
1
Lastpage :
5
Abstract :
Missing data is a problem that permeates much of the research bring done today. Some data frequently contain missing values such as gene expression data, which most of its down stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the data matrix. In this report we describe an evaluation of top three current methods including a neural network method and two imputation methods on multiple types of data including microarray data, time series data such as air pollutant data and phytoplankton data. Based on the overall performance of the method, we then determine the most appropriate method that can be applied to various data sets. We found that the optimal method (local least square imputation (LLS) and Bayesian principle component analyses (BPCA)) are all highly competitive to each other in overall results. We tested with radial basis function (RBF) network method which is one of the neural network methods and found that, the overall performance of RBF network is lower than BPCA method and LLS method. According to the overall NRMSE of the three methods, the BPCA method provides the most accurate estimation for missing values.
Keywords :
Bayes methods; estimation theory; least squares approximations; principal component analysis; radial basis function networks; Bayesian principle component analyses; air pollutant data; local least square imputation; microarray data; missing value estimation; neural network method; phytoplankton data; radial basis function network; time series data; Air pollution; Bayesian methods; Clustering algorithms; Clustering methods; Data analysis; Gene expression; Least squares methods; Mathematics; Neural networks; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electronic Design, 2008. ICED 2008. International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4244-2315-6
Electronic_ISBN :
978-1-4244-2315-6
Type :
conf
DOI :
10.1109/ICED.2008.4786656
Filename :
4786656
Link To Document :
بازگشت