Title :
Mining Clusters in XML Corpora Based on Bayesian Generative Topic Modeling
Author :
Gianni Costa;Riccardo Ortale
Author_Institution :
ICAR Inst., Rende, Italy
Abstract :
We study XML partitioning via unsupervised topic modeling. A new mixed-membership Bayesian generative model of the latent topics in XML corpora is proposed. Approximate posterior inference and parameter estimation are derived for the devised XML topic model and implemented by a Gibbs sampling algorithm. This is used to infer the topic distributions of the input XML documents. In turn, such distributions are separated to divide the whole XML corpus by latent-topic similarity. Experiments on real-world XML corpora reveal an overcoming effectiveness with respect to several state-of-the-art competitors.
Keywords :
"XML","Vegetation","Adaptation models","Probability distribution","Bayes methods","Parameter estimation","Probabilistic logic"
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
DOI :
10.1109/ICMLA.2015.148