Title :
Schemas Extraction for XML Documents by XML Element Sequence Patterns
Author :
Zhang Haiwei ; Yuan Xiaojie
Author_Institution :
Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin, China
Abstract :
XML is the de facto standard format for data exchange manipulation of structured documents. XML schema provides important structural information of XML documents. Unfortunately, much XML data does not have XML schema or is not accompanied by its XML schema. In order to take advantage of having XML schema in XML documents, XML schema of the XML document is significant to be extracted. This paper will present a model named XML Element Sequence Patterns (XESP) for XML documents to extract XML schemas from documents without schemas. XESPs will be built based on paths of XML elements, and represented by a sequence of XML elements with relations. Experimental results show that extracting XML schemas by XESPs will occupy less time and memory and bring more precision than traditional methods based on ECMs and EPMs.
Keywords :
XML; electronic data interchange; information retrieval; XML document; XML element sequence patterns; XML schema; data exchange manipulation; extensible markup language; Brushless DC motors; Data engineering; Data mining; Databases; Educational institutions; Electrochemical machining; Information retrieval; Information science; XML;
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
DOI :
10.1109/ICISE.2009.1047