DocumentCode
3166310
Title
Bayesian Folding-In with Dirichlet Kernels for PLSI
Author
Hinneburg, Alexander ; Gabriel, Hans-Henning ; Gohr, Andrè
Author_Institution
Martin-Luther-Univ., Halle-Wittenberg
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
499
Lastpage
504
Abstract
Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simplified version of the EM-algorithm. During PLSI- Folding-in of a new document, the topic mixtures of the known documents are ignored. This may lead to a suboptimal model of the extended collection. Our new approach incorporates the topic mixtures of the known documents in a Bayesian way during folding- in. That knowledge is modeled as prior distribution over the topic simplex using a kernel density estimate of Dirichlet kernels. We demonstrate the advantages of the new Bayesian folding-in using real text data.
Keywords
Bayes methods; document handling; expectation-maximisation algorithm; indexing; probability; Bayesian folding-in; Dirichlet kernels; PLSI-folding-in; expectation maximization algorithm; known documents; latent topics; probabilistic latent semantic indexing; Bayesian methods; Biochemistry; Costs; Data mining; Graphical models; Indexing; Kernel; Linear discriminant analysis; Runtime; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location
Omaha, NE
ISSN
1550-4786
Print_ISBN
978-0-7695-3018-5
Type
conf
DOI
10.1109/ICDM.2007.15
Filename
4470280
Link To Document