DocumentCode
3756825
Title
Mining Clusters in XML Corpora Based on Bayesian Generative Topic Modeling
Author
Gianni Costa;Riccardo Ortale
Author_Institution
ICAR Inst., Rende, Italy
fYear
2015
Firstpage
515
Lastpage
520
Abstract
We study XML partitioning via unsupervised topic modeling. A new mixed-membership Bayesian generative model of the latent topics in XML corpora is proposed. Approximate posterior inference and parameter estimation are derived for the devised XML topic model and implemented by a Gibbs sampling algorithm. This is used to infer the topic distributions of the input XML documents. In turn, such distributions are separated to divide the whole XML corpus by latent-topic similarity. Experiments on real-world XML corpora reveal an overcoming effectiveness with respect to several state-of-the-art competitors.
Keywords
"XML","Vegetation","Adaptation models","Probability distribution","Bayes methods","Parameter estimation","Probabilistic logic"
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type
conf
DOI
10.1109/ICMLA.2015.148
Filename
7424368
Link To Document