• DocumentCode
    2582085
  • Title

    Finding Syntactic Similarities Between XML Documents

  • Author

    Rafiei, Davood ; Moise, Daniel L. ; Sun, Dabo

  • Author_Institution
    Alberta Univ., Edmonton, Alta.
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    512
  • Lastpage
    516
  • Abstract
    Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description
  • Keywords
    XML; computational complexity; tree data structures; XML document; structural similarities; syntactic similarities; time complexity; tree edit distance; Data mining; Expert systems; Indexing; Query processing; Relational databases; Scattering; Sun; Tree graphs; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2006. DEXA '06. 17th International Workshop on
  • Conference_Location
    Krakow
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-2641-1
  • Type

    conf

  • DOI
    10.1109/DEXA.2006.62
  • Filename
    1698396