• DocumentCode
    2865567
  • Title

    An improved categorization of classifier´s sensitivity on sample selection bias

  • Author

    Fan, Wei ; Davidson, Ian ; Zadrozny, Bianca ; Yu, Philip S.

  • Author_Institution
    IBM TJ .Watson Res., Hawthorne, NY, USA
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper, the true model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the true class label for every example) is implicitly assumed to be contained in the model space of the learner, and the true class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However, in the discussion of naive Bayes, decision trees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.
  • Keywords
    Bayes methods; decision trees; learning (artificial intelligence); pattern classification; regression analysis; support vector machines; Bayesian classifier; classifier learning; classifier sensitivity categorization; decision trees; hard-margin support vector machine; logistic regression; naive Bayes; sample selection bias; Bayesian methods; Classification tree analysis; Computer science; Data mining; Decision trees; Logistics; Regression tree analysis; Support vector machines; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.24
  • Filename
    1565737