DocumentCode :
2752798
Title :
A preliminary study on missing data imputation in evolutionary fuzzy systems of subgroup discovery
Author :
Carmona, C.J. ; Luengo, J. ; González, P. ; Jesus, M. J del
Author_Institution :
Dept. of Comput. Sci., Univ. of Jaen, Jaen, Spain
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
7
Abstract :
In real-life data, a loss of information is frequent in data mining due to the presence of missing values in the attributes. Missing values can occur due to problems in the manual data entry procedures, equipment errors or incorrect measurements. The presence of missing values in attributes conditions the results obtained by any knowledge extraction approach. Specifically, this problem could lead in subgroup discovery to a loss of quality of results obtained by subgroups on measures such as sensitivity, confidence, significance or unusualness. This paper presents an experimental study to analyse the effect of different missing data imputation mechanisms within subgroup discovery algorithms based on evolutionary fuzzy systems presented throughout the literature. The analysis is carried out with a large number of data sets obtained from KEEL repository. Among all the imputation techniques, the imputation method K-Nearest Neighbour outstands as the best option. In summary, if experts need to analyse a problem with a high percentage of missing values they must use this imputation method in order to treat data in a correct way and also to obtain a meaningful descriptive knowledge. In addition, results also show that the evolutionary fuzzy system with the best results is the algorithm NMEEF-SD in the missing values scenario.
Keywords :
data analysis; data mining; evolutionary computation; fuzzy set theory; fuzzy systems; pattern classification; KEEL repository; NMEEF-SD algorithm; data mining; data sets; equipment errors; evolutionary fuzzy systems; incorrect measurements; k-nearest neighbour; knowledge extraction; manual data entry procedures; missing data imputation mechanisms; missing values; subgroup discovery algorithms; Algorithm design and analysis; Argon; Data mining; Educational institutions; Fuzzy systems; Guidelines; Sensitivity; Evolutionary Fuzzy System; Missing Data Imputation; Subgroup Discovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems (FUZZ-IEEE), 2012 IEEE International Conference on
Conference_Location :
Brisbane, QLD
ISSN :
1098-7584
Print_ISBN :
978-1-4673-1507-4
Electronic_ISBN :
1098-7584
Type :
conf
DOI :
10.1109/FUZZ-IEEE.2012.6251182
Filename :
6251182
Link To Document :
بازگشت