DocumentCode
2256245
Title
A sample selection algorithm based on maximum entropy and contribution
Author
Zhang, Ning ; Xiao, Tao
Author_Institution
Dept. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
Volume
1
fYear
2010
fDate
11-14 July 2010
Firstpage
397
Lastpage
402
Abstract
The focus of sample selection algorithm is to decide which sample to store for generalization. Storing too many samples can result in large storage requirement and slow execution speed, and it leads to overfitting when predicting. This paper presents a new sample selection algorithm for nearest neighbor rule. In this algorithm, an evaluation function for samples is defined. According to the evaluation function, which combines maximum entropy and contribution of a sample, the most valuable samples are selected. This algorithm prefers to select samples on the boundary, and it can achieve good prediction accuracy. As certain error rate is allowed on the training data, this algorithm is noise insensitive. Experiments are conducted on both synthetic and real datasets.
Keywords
maximum entropy methods; pattern classification; evaluation function; maximum entropy; nearest neighbor rule; sample contribution; sample selection algorithm; storage requirement; Classification algorithms; Entropy; Machine learning algorithms; Nearest neighbor searches; Noise; Prediction algorithms; Training; contribution; maximum entropy; nearest neighbor rule; sample selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location
Qingdao
Print_ISBN
978-1-4244-6526-2
Type
conf
DOI
10.1109/ICMLC.2010.5581031
Filename
5581031
Link To Document