Title of article
Decision tree approaches for zero-inflated count data
Author/Authors
Seong-Keon Lee & Seohoon Jin، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2006
Pages
13
From page
853
To page
865
Abstract
There have been many methodologies developed about zero-inflated data in the field of
statistics. However, there is little literature in the data mining fields, even though zero-inflated data
could be easily found in real application fields. In fact, there is no decision tree method that is
suitable for zero-inflated responses. To analyze continuous target variable with decision trees as
one of data mining techniques, we use F-statistics (CHAID) or variance reduction (CART)
criteria to find the best split. But these methods are only appropriate to a continuous target
variable. If the target variable is rare events or zero-inflated count data, the above criteria could
not give a good result because of its attributes. In this paper, we will propose a decision tree for
zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split
criterion. In addition, using well-known data sets we will compare the performance of the split
criteria. In the case when the analyst is interested in lower value groups (e.g. no defect areas,
customers who do not claim), the suggested ZIP tree would be more efficient
Keywords
DATA MINING , Decision tree , Homogeneity , maximum likelihood , zero-inflatedPoisson (ZIP)
Journal title
JOURNAL OF APPLIED STATISTICS
Serial Year
2006
Journal title
JOURNAL OF APPLIED STATISTICS
Record number
712078
Link To Document