• DocumentCode
    3221666
  • Title

    A Document Clustering Method Based on Hierarchical Algorithm with Model Clustering

  • Author

    Sun, Haojun ; Liu, Zhihui ; Kong, Lingjun

  • Author_Institution
    Shantou Univ., Shantou
  • fYear
    2008
  • fDate
    25-28 March 2008
  • Firstpage
    1229
  • Lastpage
    1233
  • Abstract
    Document clustering is an important tool for text analysis and is used in many applications. This work develops a novel hierarchal algorithm for document clustering. We are particularly interested in studying and making use of cluster overlapping phenomenon to design cluster merging criteria. In our previous papers, the theoretical results on the overlap rate between clusters based on the Gaussian mixture model were reported. In this paper, we propose a new way to compute the overlap rate in order to improve time efficiency and "the veracity". The way is that we use a line passed through the two cluster\´s center instead of the ridge curve. Based on the hierarchical clustering method, we use the expectation-maximization (EM) algorithm in the Gaussian mixture model to count the parameters and make the two sub-clusters combined when their overlap is the largest. Experiments in both public data and document clustering data show that this approach can improve the efficiency of clustering and save computing time.
  • Keywords
    Gaussian processes; expectation-maximisation algorithm; text analysis; Gaussian mixture model; document clustering method; expectation-maximization algorithm; hierarchical clustering method; model clustering; ridge curve; text analysis; Application software; Clustering algorithms; Clustering methods; Computer networks; Electronic mail; Frequency; Gaussian distribution; Mathematical model; Mathematics; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
  • Conference_Location
    Okinawa
  • Print_ISBN
    978-0-7695-3096-3
  • Type

    conf

  • DOI
    10.1109/WAINA.2008.45
  • Filename
    4483087