DocumentCode :
2209659
Title :
Efficient Probabilistic Latent Semantic Analysis with Sparsity Control
Author :
Liu, Sen ; Xia, Chaolun ; Jiang, Xiaohong
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
905
Lastpage :
910
Abstract :
Probabilistic latent semantic analysis is a topic modeling technique to discover the hidden structure in binary and count data. As a mixture model, it performs a probabilistic mixture decomposition on the co-occurrence matrix, which produces two matrices assigned with probabilistic explanations. However, the factorized matrices may be rather smooth, which means we may obtain global feature and topic representations rather than expected local ones. To resolve this problem, one of the solutions is to revise the decomposition process with considerations of sparsity. In this paper, we present an approach that provides direct control over sparsity during the expectation maximization process. Furthermore, by using the log penalty function as sparsity measurement instead of the widely used L2 norm, we can approximate the re-estimation of parameters in linear time, as same as original PLSA does, while many other approaches require much more time. Experiments on face databases are reported to show visual representations on obtaining local features, and detailed improvements in clustering tasks compared with the original process.
Keywords :
information retrieval; learning (artificial intelligence); matrix decomposition; sparse matrices; visual databases; cooccurrence matrix; expectation maximization process; factorized matrices; probabilistic latent semantic analysis; probabilistic mixture decomposition; sparsity control; topic modeling; data-adaptive representations; opic model; plsa; sparsity; unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.136
Filename :
5694059
Link To Document :
بازگشت