• DocumentCode
    2533284
  • Title

    A Hybrid Method for XML Clustering

  • Author

    Piao, Yong ; Liu, Chen ; Wang, Xiu-Kun

  • Author_Institution
    Sch. of Software, Dalian Univ. of Technol., Dalian, China
  • fYear
    2010
  • fDate
    18-20 Dec. 2010
  • Firstpage
    286
  • Lastpage
    290
  • Abstract
    An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct search for cluster centers. Experiments show that the NCC can obtain high purity and F-measure value and is suitable and applicable for clustering XML with both homogenous and heterogeneous structures.
  • Keywords
    XML; computational complexity; data mining; pattern clustering; text analysis; F-measure value; TF-IDF principle; XML clustering; XML file; computation complexity; content information; extensible markup language; longest common subsequence; neighbor center clustering; structural information; text mining; Accuracy; Clustering algorithms; Data mining; Equations; Feature extraction; Mathematical model; XML; Longest Common Subsequence; neighbor center cluster; structural similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-9482-8
  • Type

    conf

  • DOI
    10.1109/PAAP.2010.55
  • Filename
    5715096