Title of article :
Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering
Author/Authors :
Azadi، نويسنده , , Tahereh Emami and Almasganj، نويسنده , , Farshad، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Pages :
9
From page :
10485
To page :
10493
Abstract :
Probabilistic latent semantic analysis (PLSA) is a double structure mixture model which has got a wide application in text and web mining. This method is capable of establishing hidden semantic relations among the observed features, using a number of latent variables. In this approach, the selection of the correct number of latent variables is critical. In the most of the previous researches, the number of latent topics was selected based on the number of invoked classes. This paper presents a method, based on backward elimination approach, which is capable of unsupervised order selection in PLSA. This method starts with a model having a number of components more than the needed value, and then prunes the mixtures to reach their optimum size. During the elimination process, proper selection of some latent variables which must be deleted is the most essential problem, and its relation to the final performance of the pruned model is straightforward. To treat this problem, we introduce a new combined pruning method which selects the best options for removal, while keeping a low computational cost, at all. We conducted some experiments on two datasets from Reuters-21578 corpus. The obtained results show that this algorithm leads to an optimized number of latent variables and in turn achieves better clustering performance compared to the conventional model selection methods. It also shows superiority over the case in which a PLSA model with a fixed number of latent variables, equal to the real number of clusters, is exploited.
Keywords :
Document clustering , Model selection , EM algorithm , Bayesian Information Criterion (BIC) , pLSA
Journal title :
Expert Systems with Applications
Serial Year :
2009
Journal title :
Expert Systems with Applications
Record number :
2346816
Link To Document :
بازگشت