DocumentCode :
3165381
Title :
Detecting Fractures in Classifier Performance
Author :
Cieslak, David A. ; Chawla, Nitesh V.
Author_Institution :
Univ. of Notre Dame, Notre Dame
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
123
Lastpage :
132
Abstract :
A fundamental tenet assumed by many classification algorithms is the presumption that both training and testing samples are drawn from the same distribution of data - this is the stationary distribution assumption. This entails that the past is strongly indicative of the future. However, in real world applications, many factors may alter the One True Model responsible for generating the data distribution both significantly and subtly. In circumstances violating the stationary distribution assumption, traditional validation schemes such as ten-folds and hold-out become poor performance predictors and classifier rankers. Thus, it becomes critical to discover the fracture points in classifier performance by discovering the divergence between populations. In this paper, we implement a comprehensive evaluation framework to identify bias, enabling selection of a "correct" classifier given the sample bias. To thoroughly evaluate the performance of classifiers within biased distributions, we consider the following three scenarios: missing completely at random (akin to stationary); missing at random; and missing not at random. The latter reflects the canonical sample selection bias problem.
Keywords :
data mining; pattern classification; biased distributions; classification algorithms; classifier performance; data distribution; fracture points; stationary distribution assumption; Classification algorithms; Computer science; Data engineering; Data mining; Decision trees; Machine learning; Measurement; Risk management; Testing; Virtual colonoscopy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3018-5
Type :
conf
DOI :
10.1109/ICDM.2007.106
Filename :
4470236
Link To Document :
بازگشت