DocumentCode
2548249
Title
Approximate Validity of XML Streaming Data
Author
Cheng, Huang ; Jun, Li ; de Rougemont, M.
Author_Institution
Univ. Paris-Sud, Orsay
fYear
2008
fDate
20-22 July 2008
Firstpage
149
Lastpage
156
Abstract
We present a SAX implementation of the statistical embedding associated with XML data, introduced in [1], [2], which allows to efficiently decide eps-validity to any DTD or Schema, for the Edit Distance with Moves. It associates a generalized k-gram to unranked labelled trees (with k = 1/epsiv) from which any regular property can be approximately decided. We show how to exactly compute the k-gram with a SAX implementation using a memory of size d, the depth of the tree, and an approximate k-gram with queues of size M = 2k and a global memory of size 2k in the worst-case. Experiments on large XML files from the XML benchmark project confirm the error analysis for various values of M.
Keywords
XML; approximation theory; statistical analysis; tree data structures; SAX implementation; approximate XML streaming data validity; generalized k-gram labelled tree; statistical data embedding; unranked labelled tree; Benchmark testing; Error analysis; Information management; Sampling methods; Scalability; Search problems; Tree graphs; Virtual manufacturing; Web mining; XML; XML Approximation Web-mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location
Zhangjiajie Hunan
Print_ISBN
978-0-7695-3185-4
Electronic_ISBN
978-0-7695-3185-4
Type
conf
DOI
10.1109/WAIM.2008.97
Filename
4597008
Link To Document