DocumentCode
1917256
Title
Improving Range Query Performance on Historic Web Page Data
Author
Li, Geng ; Peng, Bo
Author_Institution
Lab. of Comput. Networks & Distrib. Syst., Peking Univ., Beijing, China
fYear
2010
fDate
16-18 July 2010
Firstpage
87
Lastpage
91
Abstract
This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.
Keywords
Internet; hypermedia markup languages; query processing; HTML data; historic web page data; range query performance improvement; Distributed databases; Hard disks; Indexing; Optimization; Web pages; performance optimization; storage design; web-scale data access;
fLanguage
English
Publisher
ieee
Conference_Titel
ChinaGrid Conference (ChinaGrid), 2010 Fifth Annual
Conference_Location
Guangzhou
Print_ISBN
978-1-4244-7543-8
Electronic_ISBN
978-1-4244-7544-5
Type
conf
DOI
10.1109/ChinaGrid.2010.28
Filename
5563021
Link To Document