• DocumentCode
    3301212
  • Title

    A Comparison of global and local probabilistic approximations in mining data with many missing attribute values

  • Author

    Clark, Patrick G. ; Grzymala-Busse, Jerzy W.

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
  • fYear
    2013
  • fDate
    13-15 Dec. 2013
  • Firstpage
    76
  • Lastpage
    81
  • Abstract
    We present results of a novel experimental comparison of global and local probabilistic approximations. Global approximations are unions of characteristic sets while local approximations are constructed from blocks of attributevalue pairs. Two interpretations of missing attribute values are discussed: lost values and “do not care” conditions. Our main objective was to compare global and local probabilistic approximations in terms of the error rate. For our experiments we used six incomplete data sets with many missing attribute values. The best results were accomplished by global approximations (for two data sets), by local approximations (for one data set), and for the remaining three data sets the experiments ended with ties. Our next objective was to check the quality of non-standard probabilistic approximations, i.e., probabilistic approximations that were neither lower nor upper approximations. For four data sets the smallest error rate was accomplished by non-standard probabilistic approximations, for the remaining two data sets the smallest error rate was accomplished by upper approximations. Our final objective was to compare two interpretations of missing attribute values. For three data sets the best interpretation was the lost value, for one data set it was the “do not care” condition, for the remaining two cases there was a tie.
  • Keywords
    approximation theory; data mining; probability; rough set theory; attribute-value pairs; characteristic sets unions; data mining; error rate; global probabilistic approximations; local probabilistic approximations; missing attribute values; nonstandard probabilistic approximations; rough set theory; Approximation algorithms; Approximation methods; Data mining; Educational institutions; Error analysis; Probabilistic logic; Data mining; MLEM2 rule induction algorithm; parameterized approximations; probabilistic approximations; rough set theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Granular Computing (GrC), 2013 IEEE International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/GrC.2013.6740384
  • Filename
    6740384