• DocumentCode
    2739926
  • Title

    A Methodology for Clustering XML Documents Based on Labeled Tree

  • Author

    Liu, Lei ; Zheng, Yongqing ; Ding, Baoshi ; Liu, Haiyan

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    397
  • Lastpage
    401
  • Abstract
    The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. In this paper, we propose the novel similarity calculation measure by reducing Nesting and repeating in the whole XML document. Then propose an improved Edge-set comparison algorithm to calculate two XML documents´ similarity. Our experiments show that the proposed method improves accuracy on the clustering, compared to the previous works.
  • Keywords
    XML; pattern clustering; XML documents; document clustering; document similarity measure; edge-set comparison algorithm; hypermedia markup language; labeled tree; nesting reduction measure; repeating reduction method; Clustering algorithms; Computer science; Educational institutions; Fuzzy systems; Information analysis; Information retrieval; Knowledge management; Management information systems; Measurement standards; XML; Clustering; Data mining; Semi-structured data; Structural similarity; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.181
  • Filename
    5358550