DocumentCode
840519
Title
A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches
Author
Wu, QingXiang ; Bell, David A. ; Prasad, Girijesh ; McGinnity, Thomas Martin
Author_Institution
Sch. of Comput. Sci., Queen´´s Univ., Belfast
Volume
19
Issue
1
fYear
2007
Firstpage
17
Lastpage
28
Abstract
When symbolic AI approaches are applied to handle continuous valued attributes, there is a requirement to transform the continuous attribute values to symbolic data. In this paper, a novel distribution-index-based discretizer is proposed for such a transformation. Based on definitions of dichotomic entropy and a compound distributional index, a simple criterion is applied to discretize continuous attributes adaptively. The dichotomic entropy indicates the homogeneity degree of the decision value distribution, and is applied to determine the best splitting point. The compound distributional index combines both the homogeneity degrees of attribute value distributions and the decision value distribution, and is applied to determine which interval should be split further; thus, a potentially improved solution of the discretization problem can be found efficiently. Based on multiple reducts in rough set theory, a multiknowledge approach can attain high decision accuracy for information systems with a large number of attributes and missing values. In this paper, our discretizer is combined with the multiknowledge approach to further improve decision accuracy for information systems with continuous attributes. Experimental results on benchmark data sets show that the new discretizer can improve not only the multiknowledge approach, but also the naive Bayes classifier and the C5.0 tree
Keywords
data mining; database indexing; decision making; learning (artificial intelligence); C5.0 tree; attribute value distribution; benchmark data set; compound distributional index; decision value distribution; decision-making; dichotomic entropy; distribution-index-based discretizer; homogeneity degree; multiknowledge approach; naive Bayes classifier; rough set theory; symbolic AI approach; Artificial intelligence; Bayesian methods; Classification tree analysis; Data mining; Distributed decision making; Entropy; Information systems; Machine learning; Set theory; Testing; Data mining; decision support.; information theory; machine learning;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2007.250582
Filename
4016512
Link To Document