• DocumentCode
    1917256
  • Title

    Improving Range Query Performance on Historic Web Page Data

  • Author

    Li, Geng ; Peng, Bo

  • Author_Institution
    Lab. of Comput. Networks & Distrib. Syst., Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    16-18 July 2010
  • Firstpage
    87
  • Lastpage
    91
  • Abstract
    This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.
  • Keywords
    Internet; hypermedia markup languages; query processing; HTML data; historic web page data; range query performance improvement; Distributed databases; Hard disks; Indexing; Optimization; Web pages; performance optimization; storage design; web-scale data access;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    ChinaGrid Conference (ChinaGrid), 2010 Fifth Annual
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-4244-7543-8
  • Electronic_ISBN
    978-1-4244-7544-5
  • Type

    conf

  • DOI
    10.1109/ChinaGrid.2010.28
  • Filename
    5563021