Title :
A sample selection algorithm based on maximum entropy and contribution
Author :
Zhang, Ning ; Xiao, Tao
Author_Institution :
Dept. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
Abstract :
The focus of sample selection algorithm is to decide which sample to store for generalization. Storing too many samples can result in large storage requirement and slow execution speed, and it leads to overfitting when predicting. This paper presents a new sample selection algorithm for nearest neighbor rule. In this algorithm, an evaluation function for samples is defined. According to the evaluation function, which combines maximum entropy and contribution of a sample, the most valuable samples are selected. This algorithm prefers to select samples on the boundary, and it can achieve good prediction accuracy. As certain error rate is allowed on the training data, this algorithm is noise insensitive. Experiments are conducted on both synthetic and real datasets.
Keywords :
maximum entropy methods; pattern classification; evaluation function; maximum entropy; nearest neighbor rule; sample contribution; sample selection algorithm; storage requirement; Classification algorithms; Entropy; Machine learning algorithms; Nearest neighbor searches; Noise; Prediction algorithms; Training; contribution; maximum entropy; nearest neighbor rule; sample selection;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
DOI :
10.1109/ICMLC.2010.5581031