• DocumentCode
    2256245
  • Title

    A sample selection algorithm based on maximum entropy and contribution

  • Author

    Zhang, Ning ; Xiao, Tao

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
  • Volume
    1
  • fYear
    2010
  • fDate
    11-14 July 2010
  • Firstpage
    397
  • Lastpage
    402
  • Abstract
    The focus of sample selection algorithm is to decide which sample to store for generalization. Storing too many samples can result in large storage requirement and slow execution speed, and it leads to overfitting when predicting. This paper presents a new sample selection algorithm for nearest neighbor rule. In this algorithm, an evaluation function for samples is defined. According to the evaluation function, which combines maximum entropy and contribution of a sample, the most valuable samples are selected. This algorithm prefers to select samples on the boundary, and it can achieve good prediction accuracy. As certain error rate is allowed on the training data, this algorithm is noise insensitive. Experiments are conducted on both synthetic and real datasets.
  • Keywords
    maximum entropy methods; pattern classification; evaluation function; maximum entropy; nearest neighbor rule; sample contribution; sample selection algorithm; storage requirement; Classification algorithms; Entropy; Machine learning algorithms; Nearest neighbor searches; Noise; Prediction algorithms; Training; contribution; maximum entropy; nearest neighbor rule; sample selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
  • Conference_Location
    Qingdao
  • Print_ISBN
    978-1-4244-6526-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2010.5581031
  • Filename
    5581031