• DocumentCode
    2247382
  • Title

    A clustering-Based KNN improved algorithm CLKNN for text classification

  • Author

    Zhou, Lijuan ; Wang, Linshuang ; Ge, Xuebin ; Shi, Qian

  • Author_Institution
    Inf. Eng. Coll., Capital Normal Univ., Beijing, China
  • Volume
    3
  • fYear
    2010
  • fDate
    6-7 March 2010
  • Firstpage
    212
  • Lastpage
    215
  • Abstract
    As a simple, effective and nonparametric classification method, k Nearest Neighbor (KNN) is widely used in document classification for dealing with the much more difficult problem such as large-scale or many of categories. But KNN classifier may have a problem when training samples are uneven. The problem is that KNN classifier may decrease the precision of classification because of the uneven density of training data. To solve the problem, a new clustering-based KNN method is presented in this paper. It preprocesses training data by using clustering, then classify with a new KNN algorithm, which adopts a dynamic adjustment in each iteration for the neighborhood number K. This method would avoid the uneven classification phenomenon and reduce the misjudgment of the boundary testing samples. We have an experiment in text classification and the result shows that it has good performance.
  • Keywords
    pattern classification; pattern clustering; text analysis; KNN algorithm; KNN classifier; boundary testing; clustering-based KNN method; document classification; dynamic adjustment; k nearest neighbor; nonparametric classification method; text classification; training data; Classification algorithms; Clustering algorithms; Educational institutions; Large-scale systems; Nearest neighbor searches; Robotics and automation; Support vector machine classification; Support vector machines; Testing; Text categorization; Clustering; KNN; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatics in Control, Automation and Robotics (CAR), 2010 2nd International Asia Conference on
  • Conference_Location
    Wuhan
  • ISSN
    1948-3414
  • Print_ISBN
    978-1-4244-5192-0
  • Electronic_ISBN
    1948-3414
  • Type

    conf

  • DOI
    10.1109/CAR.2010.5456668
  • Filename
    5456668