• DocumentCode
    2548249
  • Title

    Approximate Validity of XML Streaming Data

  • Author

    Cheng, Huang ; Jun, Li ; de Rougemont, M.

  • Author_Institution
    Univ. Paris-Sud, Orsay
  • fYear
    2008
  • fDate
    20-22 July 2008
  • Firstpage
    149
  • Lastpage
    156
  • Abstract
    We present a SAX implementation of the statistical embedding associated with XML data, introduced in [1], [2], which allows to efficiently decide eps-validity to any DTD or Schema, for the Edit Distance with Moves. It associates a generalized k-gram to unranked labelled trees (with k = 1/epsiv) from which any regular property can be approximately decided. We show how to exactly compute the k-gram with a SAX implementation using a memory of size d, the depth of the tree, and an approximate k-gram with queues of size M = 2k and a global memory of size 2k in the worst-case. Experiments on large XML files from the XML benchmark project confirm the error analysis for various values of M.
  • Keywords
    XML; approximation theory; statistical analysis; tree data structures; SAX implementation; approximate XML streaming data validity; generalized k-gram labelled tree; statistical data embedding; unranked labelled tree; Benchmark testing; Error analysis; Information management; Sampling methods; Scalability; Search problems; Tree graphs; Virtual manufacturing; Web mining; XML; XML Approximation Web-mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
  • Conference_Location
    Zhangjiajie Hunan
  • Print_ISBN
    978-0-7695-3185-4
  • Electronic_ISBN
    978-0-7695-3185-4
  • Type

    conf

  • DOI
    10.1109/WAIM.2008.97
  • Filename
    4597008