• DocumentCode
    2877510
  • Title

    An Efficient Clustering Algorithm for Small Text Documents

  • Author

    Liu, Yubao ; Cai, Jiarong ; Yin, Jian ; Huang, Zhilan

  • Author_Institution
    Sun Yat-Sen University, China
  • fYear
    2006
  • fDate
    38869
  • Firstpage
    16
  • Lastpage
    16
  • Abstract
    Clustering text documents into different category groups is an important problem. The size of desired clusters is an important requirement for a clustering solution. In this paper, we present an efficient clustering algorithm called RTC based on the spherical k-means algorithm for small text documents. In RTC, we present a new initial centers choice method based on the density and farthest distance strategies. Based on the first variations adjustment of Ping-Pong algorithm, we also present a new partition adjustment method, which is guided by the set of border objects of clusters. We test the algorithm performance based on the Chinese natural language platform. The experimental results show that RTC outperforms the spherical k-means and bisecting k-means in clustering accuracy and Ping-Pong both in clustering accuracy and clustering time. Especially, in the clustering time aspect, RTC sometimes is 5 times faster than Ping- Pong.
  • Keywords
    Clustering algorithms; Computer science; Euclidean distance; Information security; Laboratories; Natural languages; Partitioning algorithms; Refining; Sun; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web-Age Information Management Workshops, 2006. WAIM '06. Seventh International Conference on
  • Conference_Location
    Hong Kong, China
  • Print_ISBN
    0-7695-2705-1
  • Type

    conf

  • DOI
    10.1109/WAIMW.2006.4
  • Filename
    4027176