• DocumentCode
    783651
  • Title

    Automatic fragment detection in dynamic Web pages and its impact on caching

  • Author

    Ramaswamy, Lakshmish ; Arun lyengar ; Liu, Ling ; Douglis, Fred

  • Author_Institution
    Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
  • Volume
    17
  • Issue
    6
  • fYear
    2005
  • fDate
    6/1/2005 12:00:00 AM
  • Firstpage
    859
  • Lastpage
    874
  • Abstract
    Constructing Web pages from fragments has been shown to provide significant benefits for both content generation and caching. In order for a Web site to use fragment-based content generation, however, good methods are needed for fragmenting the Web pages. Manual fragmentation of Web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in Web sites serving dynamic content. Our approach analyzes Web pages with respect to their information sharing behavior, personalization characteristics, and change patterns. We identify fragments which are shared among multiple documents or have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a framework for fragment detection, which includes a hierarchical and fragment-aware model for dynamic Web pages and a compact and effective data structure for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. This paper shows the results when the algorithms are applied to real Web sites. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of using the fragments detected by our system on key parameters such as disk space utilization, network bandwidth consumption, and load on the origin servers.
  • Keywords
    Internet; cache storage; content management; data structures; Web page construction; Web site; automatic fragment detection; data structure; disk space utilization; dynamic content caching; fragment-based content generation; information sharing behavior; multiple documents; network bandwidth consumption; personalization characteristics; Bandwidth; Data structures; Information analysis; Network servers; Pattern analysis; Publishing; Space technology; Web pages; Web server; Web sites; Index Terms- Dynamic content caching; fragment detection.; fragment-based caching;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.89
  • Filename
    1423985