Title :
Error signal distribution as an indicator of imbalanced data
Author :
Furundzic, Drasko ; Stankovic, Stevan ; Dimic, Goran
Author_Institution :
Mihajlo Pupin Inst., Belgrade, Serbia
Abstract :
This paper defines criteria for assessing the imbalance of datasets for training predictive learning models. The most important criterion for evaluating the imbalance is the distribution of the error signal over the space of local measure of distances between the points of the training set. In this paper is presented the analysis of this indicator for the sets of various distributions, and it has been shown that the most information potential for the case of the identical mapping of data sets from the real domain is incorporated within the data whose internal distribution is uniform.
Keywords :
data handling; learning (artificial intelligence); statistical distributions; data sets; error signal distribution; imbalanced data; internal distribution; local measure; predictive learning models; training set; Approximation methods; Data mining; Data models; Electronic mail; Entropy; Predictive models; Training; Imbalanced data; imbalanced learning; predictive models;
Conference_Titel :
Neural Network Applications in Electrical Engineering (NEUREL), 2014 12th Symposium on
Conference_Location :
Belgrade
Print_ISBN :
978-1-4799-5887-0
DOI :
10.1109/NEUREL.2014.7011503