Title :
Handling missing values via decomposition of the conditioned set
Author :
Shyu, Mei-Ling ; Kuruppu-Appuhamilage, Indika Priyantha ; Chen, Shu-Ching ; Chang, LiWu
Author_Institution :
Dept. of Electr. & Comput. Eng., Miami Univ., Coral Gables, FL, USA
Abstract :
In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.
Keywords :
data mining; database management systems; probability; conditional probability theory; conditioned set decomposition; data mining algorithm; data quality; missing values handling; real-world database; Cleaning; Computer science; Data mining; Data preprocessing; Distributed computing; Distributed databases; Information systems; Laboratories; Multimedia databases; Multimedia systems;
Conference_Titel :
Information Reuse and Integration, Conf, 2005. IRI -2005 IEEE International Conference on.
Print_ISBN :
0-7803-9093-8
DOI :
10.1109/IRI-05.2005.1506473