Title :
iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm
Author :
Rahman, Md Geaur ; Islam, Md Zahurul
Author_Institution :
Center for Res. in Complex Syst. (CRiCS), Charles Sturt Univ., Bathurst, NSW, Australia
Abstract :
In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques.
Keywords :
data mining; decision trees; expectation-maximisation algorithm; C4.5 algorithm; DT algorithm; EM algorithm; data mining; decision tree; expectation-maximization algorithm; iDMI technique; missing value imputation; statistical significance analysis; Accuracy; Computers; Correlation; Decision trees; Electromagnetic interference; Information technology; Remuneration; Data pre-processing; Decision Trees; EM algorithm; data cleansing; missing value imputation;
Conference_Titel :
Computer and Information Technology (ICCIT), 2013 16th International Conference on
Conference_Location :
Khulna
DOI :
10.1109/ICCITechn.2014.6997351