DocumentCode :
416108
Title :
NEXSORT: sorting XML in external memory
Author :
Silberstein, Adam ; Yang, Jun
Author_Institution :
Dept. of Comput. Sci., Duke Univ., Durham, NC, USA
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
695
Lastpage :
706
Abstract :
XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT\´s uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally "easier" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is Θ(max{n,nlogm(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.
Keywords :
XML; computational complexity; merging; relational databases; sorting; tree data structures; I/O lower bound; Internet; NEXSORT; XML document; external memory; external merge sort; flat file; input document tree; nonleaf element; sort-merge join; theoretical lower bound; Computer science; Data models; Engineering profession; Internet; Law; Legal factors; Merging; Remuneration; Sorting; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320038
Filename :
1320038
Link To Document :
بازگشت