DocumentCode :
113941
Title :
An empirical study of filter-based feature selection algorithms using noisy training data
Author :
Weiwei Yuan ; Donghai Guan ; Linshan Shen ; Haiwei Pan
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
fYear :
2014
fDate :
26-28 April 2014
Firstpage :
209
Lastpage :
212
Abstract :
In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.
Keywords :
data handling; learning (artificial intelligence); data feature selection; filter-based feature selection algorithms; mislabeled data; mislabeled ratio; noisy data processing; noisy training data; Accuracy; Filtering algorithms; Noise; Noise measurement; Training; Training data; feature selection; mislabeled data; small size data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Technology (ICIST), 2014 4th IEEE International Conference on
Conference_Location :
Shenzhen
Type :
conf
DOI :
10.1109/ICIST.2014.6920367
Filename :
6920367
Link To Document :
بازگشت