DocumentCode :
907022
Title :
A compact and accurate model for classification
Author :
Last, Mark ; Maimon, Oded
Author_Institution :
Dept. of Inf. Syst. Eng., Ben-Gurion Univ. of the Negev, Israel
Volume :
16
Issue :
2
fYear :
2004
Firstpage :
203
Lastpage :
215
Abstract :
We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.
Keywords :
data mining; database management systems; decision trees; feature extraction; information theory; pattern classification; classification model; data mining; data-driven induction; databases; decision-tree model; dimensionality reduction; feature selection; information network; information theoretic network; information theory; information-theoretic algorithm; knowledge discovery; prepruning approach; Classification tree analysis; Computer Society; Credit cards; Data mining; Entropy; Information theory; Predictive models; Random variables; Spatial databases; Uncertainty;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2004.1269598
Filename :
1269598
Link To Document :
بازگشت