• DocumentCode
    466874
  • Title

    An Effective Method To Improve kNN Text Classifier

  • Author

    Hao, Xiulan ; Tao, Xiaopeng ; Zhang, Chenghong ; Hu, Yunfa

  • Author_Institution
    Fudan Univ., Shanghai
  • Volume
    1
  • fYear
    2007
  • fDate
    July 30 2007-Aug. 1 2007
  • Firstpage
    379
  • Lastpage
    384
  • Abstract
    Many of standard classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many applications. As a simple, effective categorization method, kNN is widely used, but it suffers from biased data sets, too. In developing the Prototype of Internet Information Security for Shanghai Council of Information and Security, we detect that when training data set is biased, almost all test documents of some rare categories are classified into common ones. To alleviate such a misfortune, we propose a novel concept, critical point (CP), and adapt traditional kNN by integrating CP´s approximate value, LB or UB, training number with decision rules. Exhaustive experiments illustrate that the adapted kNN achieves significant classification performance improvement on biased corpora.
  • Keywords
    text analysis; classification algorithms; critical point; kNN; text classifier; Artificial intelligence; Computer networks; Concurrent computing; Distributed computing; Information security; Internet; Management training; Software engineering; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007. SNPD 2007. Eighth ACIS International Conference on
  • Conference_Location
    Qingdao
  • Print_ISBN
    978-0-7695-2909-7
  • Type

    conf

  • DOI
    10.1109/SNPD.2007.296
  • Filename
    4287536