DocumentCode
3301212
Title
A Comparison of global and local probabilistic approximations in mining data with many missing attribute values
Author
Clark, Patrick G. ; Grzymala-Busse, Jerzy W.
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
fYear
2013
fDate
13-15 Dec. 2013
Firstpage
76
Lastpage
81
Abstract
We present results of a novel experimental comparison of global and local probabilistic approximations. Global approximations are unions of characteristic sets while local approximations are constructed from blocks of attributevalue pairs. Two interpretations of missing attribute values are discussed: lost values and “do not care” conditions. Our main objective was to compare global and local probabilistic approximations in terms of the error rate. For our experiments we used six incomplete data sets with many missing attribute values. The best results were accomplished by global approximations (for two data sets), by local approximations (for one data set), and for the remaining three data sets the experiments ended with ties. Our next objective was to check the quality of non-standard probabilistic approximations, i.e., probabilistic approximations that were neither lower nor upper approximations. For four data sets the smallest error rate was accomplished by non-standard probabilistic approximations, for the remaining two data sets the smallest error rate was accomplished by upper approximations. Our final objective was to compare two interpretations of missing attribute values. For three data sets the best interpretation was the lost value, for one data set it was the “do not care” condition, for the remaining two cases there was a tie.
Keywords
approximation theory; data mining; probability; rough set theory; attribute-value pairs; characteristic sets unions; data mining; error rate; global probabilistic approximations; local probabilistic approximations; missing attribute values; nonstandard probabilistic approximations; rough set theory; Approximation algorithms; Approximation methods; Data mining; Educational institutions; Error analysis; Probabilistic logic; Data mining; MLEM2 rule induction algorithm; parameterized approximations; probabilistic approximations; rough set theory;
fLanguage
English
Publisher
ieee
Conference_Titel
Granular Computing (GrC), 2013 IEEE International Conference on
Conference_Location
Beijing
Type
conf
DOI
10.1109/GrC.2013.6740384
Filename
6740384
Link To Document