DocumentCode :
1273414
Title :
Data categorization using decision trellises
Author :
Frasconi, Paolo ; Gori, Marco ; Soda, Giovanni
Author_Institution :
Dept. of Syst. & Inf., Florence Univ., Italy
Volume :
11
Issue :
5
fYear :
1999
Firstpage :
697
Lastpage :
712
Abstract :
We introduce a probabilistic graphical model for supervised learning on databases with categorical attributes. The proposed belief network contains hidden variables that play a role similar to nodes in decision trees and each of their states either corresponds to a class label or to a single attribute test. As a major difference with respect to decision trees, the selection of the attribute to be tested is probabilistic. Thus, the model can be used to assess the probability that a tuple belongs to some class, given the predictive attributes. Unfolding the network along the hidden states dimension yields a trellis structure having a signal flow similar to second order connectionist networks. The network encodes context specific probabilistic independencies to reduce parametric complexity. We present a custom tailored inference algorithm and derive a learning procedure based on the expectation-maximization algorithm. We propose decision trellises as an alternative to decision trees in the context of tuple categorization in databases, which is an important step for building data mining systems. Preliminary experiments on standard machine learning databases are reported, comparing the classification accuracy of decision trellises and decision trees induced by C4.5. In particular, we show that the proposed model can offer significant advantages for sparse databases in which many predictive attributes are missing
Keywords :
belief networks; data mining; decision trees; deductive databases; inference mechanisms; learning (artificial intelligence); neural nets; optimisation; probability; belief network; categorical attributes; class label; classification accuracy; context specific probabilistic independencies; custom tailored inference algorithm; data categorization; data mining systems; decision trees; decision trellises; expectation-maximization algorithm; hidden states dimension; hidden variables; learning procedure; parametric complexity; predictive attributes; probabilistic graphical model; second order connectionist networks; signal flow; single attribute test; sparse databases; standard machine learning databases; supervised learning; trellis structure; tuple categorization; Data mining; Databases; Decision trees; Expectation-maximization algorithms; Graphical models; Inference algorithms; Machine learning algorithms; Predictive models; Supervised learning; Testing;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.806931
Filename :
806931
Link To Document :
بازگشت