DocumentCode
2865567
Title
An improved categorization of classifier´s sensitivity on sample selection bias
Author
Fan, Wei ; Davidson, Ian ; Zadrozny, Bianca ; Yu, Philip S.
Author_Institution
IBM TJ .Watson Res., Hawthorne, NY, USA
fYear
2005
fDate
27-30 Nov. 2005
Abstract
A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper, the true model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the true class label for every example) is implicitly assumed to be contained in the model space of the learner, and the true class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However, in the discussion of naive Bayes, decision trees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.
Keywords
Bayes methods; decision trees; learning (artificial intelligence); pattern classification; regression analysis; support vector machines; Bayesian classifier; classifier learning; classifier sensitivity categorization; decision trees; hard-margin support vector machine; logistic regression; naive Bayes; sample selection bias; Bayesian methods; Classification tree analysis; Computer science; Data mining; Decision trees; Logistics; Regression tree analysis; Support vector machines; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, Fifth IEEE International Conference on
ISSN
1550-4786
Print_ISBN
0-7695-2278-5
Type
conf
DOI
10.1109/ICDM.2005.24
Filename
1565737
Link To Document