DocumentCode :
972266
Title :
Adaptive Bayesian Latent Semantic Analysis
Author :
Chien, Jen-Tzung ; Wu, Meng-Sung
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
Volume :
16
Issue :
1
fYear :
2008
Firstpage :
198
Lastpage :
207
Abstract :
Due to the vast growth of data collections, the statistical document modeling has become increasingly important in language processing areas. Probabilistic latent semantic analysis (PLSA) is a popular approach whereby the semantics and statistics can be effectively captured for modeling. However, PLSA is highly sensitive to task domain, which is continuously changing in real-world documents. In this paper, a novel Bayesian PLSA framework is presented. We focus on exploiting the incremental learning algorithm for solving the updating problem of new domain articles. This algorithm is developed to improve document modeling by incrementally extracting up-to-date latent semantic information to match the changing domains at run time. By adequately representing the priors of PLSA parameters using Dirichlet densities, the posterior densities belong to the same distribution so that a reproducible prior/posterior mechanism is activated for incremental learning from constantly accumulated documents. An incremental PLSA algorithm is constructed to accomplish the parameter estimation as well as the hyperparameter updating. Compared to standard PLSA using maximum likelihood estimate, the proposed approach is capable of performing dynamic document indexing and modeling. We also present the maximum a posteriori PLSA for corrective training. Experiments on information retrieval and document categorization demonstrate the superiority of using Bayesian PLSA methods.
Keywords :
Bayes methods; computational linguistics; learning (artificial intelligence); natural language processing; probability; text analysis; Dirichlet densities; adaptive Bayesian PLSA; incremental learning algorithm; natural language processing; parameter estimation; probabilistic latent semantic analysis; statistical document modeling; Bayesian methods; Data mining; Frequency; Indexing; Information retrieval; Matrix decomposition; Maximum likelihood estimation; Natural languages; Parameter estimation; Statistical analysis; Bayesian theory; Dirichlet distribution; conjugate prior; incremental learning; natural language processing; probabilistic latent semantic analysis; statistical document modeling;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.909452
Filename :
4381232
Link To Document :
بازگشت