• DocumentCode
    1845637
  • Title

    An Improved Density-Based Method for Reducing Training Data in KNN

  • Author

    Yongxia Jing ; Heping Gou ; Yaling Zhu

  • Author_Institution
    Dept. of Inf. Technol., Qiongtai Teachers Coll., Haikou, China
  • fYear
    2013
  • fDate
    21-23 June 2013
  • Firstpage
    972
  • Lastpage
    975
  • Abstract
    k-Nearest Neighbor (KNN) algorithm was an efficient text categorization algorithm in recall and accuracy, but the computational overhead of KNN was directly proportional to the sample size, so its classification speed was low in large-scale sample data. Aiming at this problem, the paper presented a density-based method for reducing training data, the method clustered each class of sample data into several clusters and reduced the noise sample data, and then combined some higher similar sample documents in each cluster into one document. Results of the experiment indicated that the method can reduce the computational overhead of KNN text classification, and the performance is approximately equal to those of the traditional KNN.
  • Keywords
    pattern classification; pattern clustering; text analysis; KNN algorithm; KNN text classification; classification speed; density-based method; documents; k-nearest neighbor algorithm; large-scale sample data; noise sample data reduction; sample data clustering; sample size; text categorization algorithm; training data reduction; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Noise; Support vector machine classification; Text categorization; Training; KNN text classification; samples reducing; similarity; text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on
  • Conference_Location
    Shiyang
  • Type

    conf

  • DOI
    10.1109/ICCIS.2013.261
  • Filename
    6643177