DocumentCode :
1514469
Title :
EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification
Author :
Shehzad, Khurram
Author_Institution :
Dept. of Ind. Eng. & Manage. Sci., Univ. of Eng. & Technol., Taxila, Pakistan
Volume :
24
Issue :
8
fYear :
2012
Firstpage :
1435
Lastpage :
1447
Abstract :
Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data. Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization. The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family. Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULES family of inductive learning algorithms.
Keywords :
data handling; data mining; decision trees; entropy; knowledge based systems; learning (artificial intelligence); pattern classification; EDISC; RULES family; class-per-class rule induction methodology; class-tailored discretization technique; continuous data set attributes; data mining; decision tree learning; discrete data handling; entropy-MDLP discretization; entropy-based principle; inductive learning algorithm; preprocessing discretization techniques; rule extraction system; rule induction algorithms; rule-based classification; Accuracy; Algorithm design and analysis; Classification algorithms; Data mining; Decision trees; Entropy; Machine learning algorithms; Discretization; continuous values; data mining; data transformation; discrete values; inductive learning; machine learning; rule induction.; supervised learning;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.101
Filename :
5765955
Link To Document :
بازگشت