• DocumentCode
    3756825
  • Title

    Mining Clusters in XML Corpora Based on Bayesian Generative Topic Modeling

  • Author

    Gianni Costa;Riccardo Ortale

  • Author_Institution
    ICAR Inst., Rende, Italy
  • fYear
    2015
  • Firstpage
    515
  • Lastpage
    520
  • Abstract
    We study XML partitioning via unsupervised topic modeling. A new mixed-membership Bayesian generative model of the latent topics in XML corpora is proposed. Approximate posterior inference and parameter estimation are derived for the devised XML topic model and implemented by a Gibbs sampling algorithm. This is used to infer the topic distributions of the input XML documents. In turn, such distributions are separated to divide the whole XML corpus by latent-topic similarity. Experiments on real-world XML corpora reveal an overcoming effectiveness with respect to several state-of-the-art competitors.
  • Keywords
    "XML","Vegetation","Adaptation models","Probability distribution","Bayes methods","Parameter estimation","Probabilistic logic"
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICMLA.2015.148
  • Filename
    7424368