• DocumentCode
    1220916
  • Title

    An extended Chi2 algorithm for discretization of real value attributes

  • Author

    Su, Chao-Ton ; Hsu, Jyh-Hwa

  • Author_Institution
    Dept. of Ind. Eng. & Eng. Manage., Nat. Tsing Hua Univ., Hsinchu, Taiwan
  • Volume
    17
  • Issue
    3
  • fYear
    2005
  • fDate
    3/1/2005 12:00:00 AM
  • Firstpage
    437
  • Lastpage
    441
  • Abstract
    The variable precision rough sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem. The modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the rough sets theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.
  • Keywords
    computational complexity; data integrity; data mining; learning (artificial intelligence); pattern classification; rough set theory; statistical analysis; data mining; modified Chi2 algorithm; real value attributes discretization; variable precision rough sets model; Approximation algorithms; Chaos; Classification algorithms; Data mining; Entropy; Error correction; Rough sets; Software algorithms; Software performance; Uncertainty;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.39
  • Filename
    1388252