• DocumentCode
    2563108
  • Title

    Change discovery of hierarchically structured, order-sensitive data in HTML/XML documents

  • Author

    Lim, SeungJin ; Ng, Yiu-Kai

  • Author_Institution
    Dept. of Comput. Sci., Utah State Univ., Logan, UT, USA
  • fYear
    2004
  • fDate
    2004
  • Firstpage
    178
  • Lastpage
    187
  • Abstract
    As hierarchically structured, order-sensitive HTML/XML data become more prevailing in online data exchange and processing, discovering changes in these data is essential in Web data processing, especially when they evolve frequently over time. We propose a change-discovery algorithm (CDA) for any two HTML/XML documents, each of which is hierarchically structured and represented as an ordered tree. The novelties of CDA include (i) the usage of weighted sequence difference to determine the edit script with the anticipated minimal operational cost and (ii) the generation of the minimal contextual differences of branches in the two given trees. Differed from existing change-detection approaches that adopt node-to-node comparisons, CDA adopts branch-to-branch comparisons. Using CDA, generated edit scripts can be processed in any order to yield the same results, which enhances parallelism. CDA also guarantees lossless reversal transformation. The time complexity of CDA is polynomial, which is proportional to the numbers of branches in any two given trees.
  • Keywords
    computational complexity; data mining; hypermedia markup languages; trees (mathematics); HTML documents; Web data processing; XML documents; branch-to-branch comparisons; change discovery; change-discovery algorithm; node-to-node comparisons; online data exchanging; online data processing; ordered tree; polynomial time complexity; weighted sequence difference; Biomedical monitoring; Computer science; Costs; Data processing; Finance; Fires; HTML; Parallel processing; Polynomials; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications and the Internet, 2004. Proceedings. 2004 International Symposium on
  • Print_ISBN
    0-7695-2068-5
  • Type

    conf

  • DOI
    10.1109/SAINT.2004.1266114
  • Filename
    1266114