• DocumentCode
    1932092
  • Title

    Document clustering using mixture model of von Mises-Fisher distributions on document manifold

  • Author

    Nguyen Kim Anh ; Nguyen The Tam ; Ngo Van Linh

  • Author_Institution
    Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
  • fYear
    2013
  • fDate
    15-18 Dec. 2013
  • Firstpage
    140
  • Lastpage
    145
  • Abstract
    Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.
  • Keywords
    data mining; mixture models; pattern clustering; statistical distributions; text analysis; Laplacian regularized vMF mixture model; LapvMF; document clustering; document manifold; probability distribution; text mining; von Mises-Fisher distribution; Clustering algorithms; Data models; Equations; Laplace equations; Manifolds; Mathematical model; Vectors; Probabilistic graphical model; clustering; graph laplacian; manifold; variational inference;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing and Pattern Recognition (SoCPaR), 2013 International Conference of
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4799-3399-0
  • Type

    conf

  • DOI
    10.1109/SOCPAR.2013.7054116
  • Filename
    7054116