• DocumentCode
    972266
  • Title

    Adaptive Bayesian Latent Semantic Analysis

  • Author

    Chien, Jen-Tzung ; Wu, Meng-Sung

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
  • Volume
    16
  • Issue
    1
  • fYear
    2008
  • Firstpage
    198
  • Lastpage
    207
  • Abstract
    Due to the vast growth of data collections, the statistical document modeling has become increasingly important in language processing areas. Probabilistic latent semantic analysis (PLSA) is a popular approach whereby the semantics and statistics can be effectively captured for modeling. However, PLSA is highly sensitive to task domain, which is continuously changing in real-world documents. In this paper, a novel Bayesian PLSA framework is presented. We focus on exploiting the incremental learning algorithm for solving the updating problem of new domain articles. This algorithm is developed to improve document modeling by incrementally extracting up-to-date latent semantic information to match the changing domains at run time. By adequately representing the priors of PLSA parameters using Dirichlet densities, the posterior densities belong to the same distribution so that a reproducible prior/posterior mechanism is activated for incremental learning from constantly accumulated documents. An incremental PLSA algorithm is constructed to accomplish the parameter estimation as well as the hyperparameter updating. Compared to standard PLSA using maximum likelihood estimate, the proposed approach is capable of performing dynamic document indexing and modeling. We also present the maximum a posteriori PLSA for corrective training. Experiments on information retrieval and document categorization demonstrate the superiority of using Bayesian PLSA methods.
  • Keywords
    Bayes methods; computational linguistics; learning (artificial intelligence); natural language processing; probability; text analysis; Dirichlet densities; adaptive Bayesian PLSA; incremental learning algorithm; natural language processing; parameter estimation; probabilistic latent semantic analysis; statistical document modeling; Bayesian methods; Data mining; Frequency; Indexing; Information retrieval; Matrix decomposition; Maximum likelihood estimation; Natural languages; Parameter estimation; Statistical analysis; Bayesian theory; Dirichlet distribution; conjugate prior; incremental learning; natural language processing; probabilistic latent semantic analysis; statistical document modeling;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2007.909452
  • Filename
    4381232