DocumentCode :
3382566
Title :
Addressing covariate shift for Genetic Fuzzy Systems classifiers: A case of study with FARC-HD for imbalanced datasets
Author :
Lopez, Victor ; Fernandez, Alicia ; Herrera, Francisco
Author_Institution :
Dept. of Comput. Sci. & A.I., Univ. of Granada, Granada, Spain
fYear :
2013
fDate :
7-10 July 2013
Firstpage :
1
Lastpage :
8
Abstract :
The estimation of the quality of the learned models in Data Mining has been traditionally carried out by means of a k-fold partition technique. However, the “random” division of the instances over the folds may results in a problem known as covariate shift, i.e. there is a different data distribution between the training and test folds. In classification with imbalanced datasets this problem is more severe. The misclassification of minority class instances due to an incorrect learning of the real boundaries caused by a not well defined data distribution, truly affects the measures of performance in this scenario. To avoid this harmful situation, we propose the use of a specific validation technique for the partitioning of the data, known as “Distribution optimally balanced stratified cross-validation”. This methodology makes the decision of placing close-by samples on different folds, so that each partition will end up with enough representatives of every region. In this contribution, we show the goodness of this methodology using Genetic Fuzzy Systems, as they are known to be robust approaches for all types of classification problems. Specifically, we have chosen the FARC-HD algorithm, a novel technique which has shown to obtain very accurate results. From the experimental analysis, which is carried out on a wide number of imbalanced datasets, we emphasize the necessity of using a proper validation methodology for extracting well founded conclusions.
Keywords :
data mining; fuzzy set theory; genetic algorithms; learning (artificial intelligence); pattern classification; FARC-HD algorithm; close-by sample placing; covariate shift; data distribution; data mining; data partitioning; distribution optimally balanced stratified cross-validation; genetic fuzzy systems classifiers; imbalanced datasets; k-fold partition technique; minority class instance misclassification; quality estimation; random division; standard learning algorithms; validation technique; Algorithm design and analysis; Estimation; Fuzzy systems; Genetics; Partitioning algorithms; Standards; Training; Covariate Shift; Dataset Shift; Genetic Fuzzy Systems; Imbalanced Datasets; Partitioning; Validation Techniques;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems (FUZZ), 2013 IEEE International Conference on
Conference_Location :
Hyderabad
ISSN :
1098-7584
Print_ISBN :
978-1-4799-0020-6
Type :
conf
DOI :
10.1109/FUZZ-IEEE.2013.6622396
Filename :
6622396
Link To Document :
بازگشت