• DocumentCode
    840519
  • Title

    A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches

  • Author

    Wu, QingXiang ; Bell, David A. ; Prasad, Girijesh ; McGinnity, Thomas Martin

  • Author_Institution
    Sch. of Comput. Sci., Queen´´s Univ., Belfast
  • Volume
    19
  • Issue
    1
  • fYear
    2007
  • Firstpage
    17
  • Lastpage
    28
  • Abstract
    When symbolic AI approaches are applied to handle continuous valued attributes, there is a requirement to transform the continuous attribute values to symbolic data. In this paper, a novel distribution-index-based discretizer is proposed for such a transformation. Based on definitions of dichotomic entropy and a compound distributional index, a simple criterion is applied to discretize continuous attributes adaptively. The dichotomic entropy indicates the homogeneity degree of the decision value distribution, and is applied to determine the best splitting point. The compound distributional index combines both the homogeneity degrees of attribute value distributions and the decision value distribution, and is applied to determine which interval should be split further; thus, a potentially improved solution of the discretization problem can be found efficiently. Based on multiple reducts in rough set theory, a multiknowledge approach can attain high decision accuracy for information systems with a large number of attributes and missing values. In this paper, our discretizer is combined with the multiknowledge approach to further improve decision accuracy for information systems with continuous attributes. Experimental results on benchmark data sets show that the new discretizer can improve not only the multiknowledge approach, but also the naive Bayes classifier and the C5.0 tree
  • Keywords
    data mining; database indexing; decision making; learning (artificial intelligence); C5.0 tree; attribute value distribution; benchmark data set; compound distributional index; decision value distribution; decision-making; dichotomic entropy; distribution-index-based discretizer; homogeneity degree; multiknowledge approach; naive Bayes classifier; rough set theory; symbolic AI approach; Artificial intelligence; Bayesian methods; Classification tree analysis; Data mining; Distributed decision making; Entropy; Information systems; Machine learning; Set theory; Testing; Data mining; decision support.; information theory; machine learning;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.250582
  • Filename
    4016512