Title :
Training data selection based on fuzzy c-means
Author :
Guan, Donghai ; Yuan, Weiwei ; Lee, Young Koo ; Lee, Sungyoung
Author_Institution :
Dept. of Comput. Eng., Kyung Hee Univ., Seoul
Abstract :
The performance of supervised learning could be improved when valuable data are selected for training. In this paper, we proposed three data selection methods based on fuzzy C-means algorithm. They are: center-based selection, border-based selection and bin-based selection. In center-based selection, the data with high degree of membership in each cluster are selected for training. In border-based selection, the data around the borders between clusters are selected. In bin-based selection, the data in each cluster are sorted based on their membership degrees. Then for each cluster, the sorted data are divided into bins. Finally, there is one data selected from each bin for training. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that bin-based selection could effectively improve the performance of learning compared to randomly selecting training samples.
Keywords :
fuzzy set theory; learning (artificial intelligence); pattern clustering; bin-based selection; border-based selection; center-based selection; fuzzy C-means; randomly selecting training samples; supervised learning; training data selection; Fuzzy systems; Training data;
Conference_Titel :
Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-1818-3
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZY.2008.4630456