Title :
EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification
Author :
Shehzad, Khurram
Author_Institution :
Dept. of Ind. Eng. & Manage. Sci., Univ. of Eng. & Technol., Taxila, Pakistan
Abstract :
Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data. Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization. The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family. Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULES family of inductive learning algorithms.
Keywords :
data handling; data mining; decision trees; entropy; knowledge based systems; learning (artificial intelligence); pattern classification; EDISC; RULES family; class-per-class rule induction methodology; class-tailored discretization technique; continuous data set attributes; data mining; decision tree learning; discrete data handling; entropy-MDLP discretization; entropy-based principle; inductive learning algorithm; preprocessing discretization techniques; rule extraction system; rule induction algorithms; rule-based classification; Accuracy; Algorithm design and analysis; Classification algorithms; Data mining; Decision trees; Entropy; Machine learning algorithms; Discretization; continuous values; data mining; data transformation; discrete values; inductive learning; machine learning; rule induction.; supervised learning;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2011.101