• DocumentCode
    3310602
  • Title

    A new partitioning based algorithm for document clustering

  • Author

    Zonghu Wang ; Zhijing Liu ; Donghui Chen ; Kai Tang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
  • Volume
    3
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1741
  • Lastpage
    1745
  • Abstract
    Document clustering is one of the key problems in text mining and information retrieval area. It groups text documents in a way that maximizes the similarity within clusters and minimizes the similarity between different clusters. Most partitioning based algorithms are sensitive to the initial centroids, the clustering result greatly depends on the initial centroids. This paper first uses unsupervised feature selection method to reduce the dimension of document feature space and then proposes a novel partitioning based algorithm which select initial cluster centriods in the process of clustering by the size and density of cluster in the datasets. The experiments on several text datasets show that the proposed approach effectively improves the quality of clustering.
  • Keywords
    feature extraction; information retrieval; text analysis; clustering quality; document clustering; document feature space; information retrieval; partitioning based algorithm; text datasets; text mining; unsupervised feature selection method; Algorithm design and analysis; Clustering algorithms; Complexity theory; Databases; Educational institutions; Heuristic algorithms; Partitioning algorithms; centroid; clustering; feature; mention; partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019857
  • Filename
    6019857