• DocumentCode
    1734182
  • Title

    A Latent Semantic Approach to XML Clustering by Content and Structure Based on Non-negative Matrix Factorization

  • Author

    Costa, Gianni ; Ortale, Riccardo

  • Author_Institution
    ICAR, Rende, Italy
  • Volume
    1
  • fYear
    2013
  • Firstpage
    179
  • Lastpage
    184
  • Abstract
    Non-negative matrix factorization is intensively used in text clustering. We investigate its exploitation in the XML domain for clustering XML documents by structure and content into topically homogeneous groups. Non-negative matrix factorization is performed through an alternating least squares method, which incorporates expedients to attenuate the burden of large-scale factorizations. This is especially relevant when massive text-centric XML corpora are processed. Empirical evidence from a comparative evaluation on real-world XML corpora reveals that our approach overcomes several state-of-the-art competitors in effectiveness.
  • Keywords
    XML; least squares approximations; matrix decomposition; pattern clustering; text analysis; XML clustering; XML documents; alternating least squares method; latent semantic approach; nonnegative matrix factorization; text clustering; Electronic publishing; Encyclopedias; Internet; Semantics; Vegetation; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2013 12th International Conference on
  • Conference_Location
    Miami, FL
  • Type

    conf

  • DOI
    10.1109/ICMLA.2013.38
  • Filename
    6784608