• DocumentCode
    3382566
  • Title

    Addressing covariate shift for Genetic Fuzzy Systems classifiers: A case of study with FARC-HD for imbalanced datasets

  • Author

    Lopez, Victor ; Fernandez, Alicia ; Herrera, Francisco

  • Author_Institution
    Dept. of Comput. Sci. & A.I., Univ. of Granada, Granada, Spain
  • fYear
    2013
  • fDate
    7-10 July 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The estimation of the quality of the learned models in Data Mining has been traditionally carried out by means of a k-fold partition technique. However, the “random” division of the instances over the folds may results in a problem known as covariate shift, i.e. there is a different data distribution between the training and test folds. In classification with imbalanced datasets this problem is more severe. The misclassification of minority class instances due to an incorrect learning of the real boundaries caused by a not well defined data distribution, truly affects the measures of performance in this scenario. To avoid this harmful situation, we propose the use of a specific validation technique for the partitioning of the data, known as “Distribution optimally balanced stratified cross-validation”. This methodology makes the decision of placing close-by samples on different folds, so that each partition will end up with enough representatives of every region. In this contribution, we show the goodness of this methodology using Genetic Fuzzy Systems, as they are known to be robust approaches for all types of classification problems. Specifically, we have chosen the FARC-HD algorithm, a novel technique which has shown to obtain very accurate results. From the experimental analysis, which is carried out on a wide number of imbalanced datasets, we emphasize the necessity of using a proper validation methodology for extracting well founded conclusions.
  • Keywords
    data mining; fuzzy set theory; genetic algorithms; learning (artificial intelligence); pattern classification; FARC-HD algorithm; close-by sample placing; covariate shift; data distribution; data mining; data partitioning; distribution optimally balanced stratified cross-validation; genetic fuzzy systems classifiers; imbalanced datasets; k-fold partition technique; minority class instance misclassification; quality estimation; random division; standard learning algorithms; validation technique; Algorithm design and analysis; Estimation; Fuzzy systems; Genetics; Partitioning algorithms; Standards; Training; Covariate Shift; Dataset Shift; Genetic Fuzzy Systems; Imbalanced Datasets; Partitioning; Validation Techniques;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems (FUZZ), 2013 IEEE International Conference on
  • Conference_Location
    Hyderabad
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4799-0020-6
  • Type

    conf

  • DOI
    10.1109/FUZZ-IEEE.2013.6622396
  • Filename
    6622396