Author/Authors :
Dariusz Piwczyn´ ski?، نويسنده , , Beata Sitkowska، نويسنده , , Ewa Wi´sniewska، نويسنده ,
Abstract :
The aim of the presented research was to statistically analyse the survival of 20,044 Polish
Merino lambs between birth and 100 day of their life, using classification trees and logistic
regression. The lamb survival trait was expressed in binomial scale: 1 for survival, 0 for
mortality. Two different models of the trees were developed, depending on the division
criterion: they were the function of entropy and the Gini index. For comparison purposes,
an additional statistical analysis was carried out using a multiple logistic regression. The
quality of decision tree models and multiple regressions was compared taking into consideration
the following criteria: average error function, average squared error, lift cumulative,
Kolmogorov–Smirnov statistics and the area under the Receiver Operating Characteristic
curve. A statistical analysis was conducted using the Enterprise Miner 6.2 software included
in the SAS package. The calculated quality criteria of four models that were developed lead
to the conclusion that the classification trees established based on the Gini index, and on the
function of entropy, are the most accurate in defining the variability of characteristics under
examination, i.e. survival of lambs up to 100 days of age. In the case of the best classification
model available, i.e. a tree built using the Gini index, the ranking of variable importance,
which was developed based on the “Importance” measure, leads to the conclusion that the
flock, type, and the year of a lamb’s birth are the most significant differentiating factors