DocumentCode
1932092
Title
Document clustering using mixture model of von Mises-Fisher distributions on document manifold
Author
Nguyen Kim Anh ; Nguyen The Tam ; Ngo Van Linh
Author_Institution
Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
fYear
2013
fDate
15-18 Dec. 2013
Firstpage
140
Lastpage
145
Abstract
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However, in fact, it is more natural and reasonable to assume that the document space is a manifold and the probability distribution that generates the data is supported on a document manifold. In this paper, we propose a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized vMF Mixture Model (LapvMFs), which explicitly considers the manifold structure. We have developed a generalized mean-field variational inference algorithm for the LapvMFs. Extensive experimental results on a large number of high dimensional text datasets demonstrate that our approach outperforms the three state-of-the-art clustering algorithms.
Keywords
data mining; mixture models; pattern clustering; statistical distributions; text analysis; Laplacian regularized vMF mixture model; LapvMF; document clustering; document manifold; probability distribution; text mining; von Mises-Fisher distribution; Clustering algorithms; Data models; Equations; Laplace equations; Manifolds; Mathematical model; Vectors; Probabilistic graphical model; clustering; graph laplacian; manifold; variational inference;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing and Pattern Recognition (SoCPaR), 2013 International Conference of
Conference_Location
Hanoi
Print_ISBN
978-1-4799-3399-0
Type
conf
DOI
10.1109/SOCPAR.2013.7054116
Filename
7054116
Link To Document