Title :
A Comparison of global and local probabilistic approximations in mining data with many missing attribute values
Author :
Clark, Patrick G. ; Grzymala-Busse, Jerzy W.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
Abstract :
We present results of a novel experimental comparison of global and local probabilistic approximations. Global approximations are unions of characteristic sets while local approximations are constructed from blocks of attributevalue pairs. Two interpretations of missing attribute values are discussed: lost values and “do not care” conditions. Our main objective was to compare global and local probabilistic approximations in terms of the error rate. For our experiments we used six incomplete data sets with many missing attribute values. The best results were accomplished by global approximations (for two data sets), by local approximations (for one data set), and for the remaining three data sets the experiments ended with ties. Our next objective was to check the quality of non-standard probabilistic approximations, i.e., probabilistic approximations that were neither lower nor upper approximations. For four data sets the smallest error rate was accomplished by non-standard probabilistic approximations, for the remaining two data sets the smallest error rate was accomplished by upper approximations. Our final objective was to compare two interpretations of missing attribute values. For three data sets the best interpretation was the lost value, for one data set it was the “do not care” condition, for the remaining two cases there was a tie.
Keywords :
approximation theory; data mining; probability; rough set theory; attribute-value pairs; characteristic sets unions; data mining; error rate; global probabilistic approximations; local probabilistic approximations; missing attribute values; nonstandard probabilistic approximations; rough set theory; Approximation algorithms; Approximation methods; Data mining; Educational institutions; Error analysis; Probabilistic logic; Data mining; MLEM2 rule induction algorithm; parameterized approximations; probabilistic approximations; rough set theory;
Conference_Titel :
Granular Computing (GrC), 2013 IEEE International Conference on
Conference_Location :
Beijing
DOI :
10.1109/GrC.2013.6740384