Title :
A preliminary study on missing data imputation in evolutionary fuzzy systems of subgroup discovery
Author :
Carmona, C.J. ; Luengo, J. ; González, P. ; Jesus, M. J del
Author_Institution :
Dept. of Comput. Sci., Univ. of Jaen, Jaen, Spain
Abstract :
In real-life data, a loss of information is frequent in data mining due to the presence of missing values in the attributes. Missing values can occur due to problems in the manual data entry procedures, equipment errors or incorrect measurements. The presence of missing values in attributes conditions the results obtained by any knowledge extraction approach. Specifically, this problem could lead in subgroup discovery to a loss of quality of results obtained by subgroups on measures such as sensitivity, confidence, significance or unusualness. This paper presents an experimental study to analyse the effect of different missing data imputation mechanisms within subgroup discovery algorithms based on evolutionary fuzzy systems presented throughout the literature. The analysis is carried out with a large number of data sets obtained from KEEL repository. Among all the imputation techniques, the imputation method K-Nearest Neighbour outstands as the best option. In summary, if experts need to analyse a problem with a high percentage of missing values they must use this imputation method in order to treat data in a correct way and also to obtain a meaningful descriptive knowledge. In addition, results also show that the evolutionary fuzzy system with the best results is the algorithm NMEEF-SD in the missing values scenario.
Keywords :
data analysis; data mining; evolutionary computation; fuzzy set theory; fuzzy systems; pattern classification; KEEL repository; NMEEF-SD algorithm; data mining; data sets; equipment errors; evolutionary fuzzy systems; incorrect measurements; k-nearest neighbour; knowledge extraction; manual data entry procedures; missing data imputation mechanisms; missing values; subgroup discovery algorithms; Algorithm design and analysis; Argon; Data mining; Educational institutions; Fuzzy systems; Guidelines; Sensitivity; Evolutionary Fuzzy System; Missing Data Imputation; Subgroup Discovery;
Conference_Titel :
Fuzzy Systems (FUZZ-IEEE), 2012 IEEE International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1507-4
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZ-IEEE.2012.6251182