DocumentCode :
3166310
Title :
Bayesian Folding-In with Dirichlet Kernels for PLSI
Author :
Hinneburg, Alexander ; Gabriel, Hans-Henning ; Gohr, Andrè
Author_Institution :
Martin-Luther-Univ., Halle-Wittenberg
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
499
Lastpage :
504
Abstract :
Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simplified version of the EM-algorithm. During PLSI- Folding-in of a new document, the topic mixtures of the known documents are ignored. This may lead to a suboptimal model of the extended collection. Our new approach incorporates the topic mixtures of the known documents in a Bayesian way during folding- in. That knowledge is modeled as prior distribution over the topic simplex using a kernel density estimate of Dirichlet kernels. We demonstrate the advantages of the new Bayesian folding-in using real text data.
Keywords :
Bayes methods; document handling; expectation-maximisation algorithm; indexing; probability; Bayesian folding-in; Dirichlet kernels; PLSI-folding-in; expectation maximization algorithm; known documents; latent topics; probabilistic latent semantic indexing; Bayesian methods; Biochemistry; Costs; Data mining; Graphical models; Indexing; Kernel; Linear discriminant analysis; Runtime; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3018-5
Type :
conf
DOI :
10.1109/ICDM.2007.15
Filename :
4470280
Link To Document :
بازگشت