• DocumentCode
    935227
  • Title

    A keyword-based semantic prefetching approach in Internet news services

  • Author

    Xu, Cheng-Zhong ; Ibrahim, Tamer I.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Wayne State Univ., Detroit, MI, USA
  • Volume
    16
  • Issue
    5
  • fYear
    2004
  • fDate
    5/1/2004 12:00:00 AM
  • Firstpage
    601
  • Lastpage
    611
  • Abstract
    Prefetching is an important technique to reduce the average Web access latency. Existing prefetching methods are based mostly on URL graphs. They use the graphical nature of HTTP links to determine the possible paths through a hypertext system. Although the URL graph-based approaches are effective in prefetching of frequently accessed documents, few of them can prefetch those URLs that are rarely visited. The paper presents a keyword-based semantic prefetching approach to overcome the limitation. It predicts future requests based on semantic preferences of past retrieved Web documents. We apply this technique to Internet news services and implement a client-side personalized prefetching system: NewsAgent. The system exploits semantic preferences by analyzing keywords in URL anchor text of previously accessed documents in different news categories. It employs a neural network model over the keyword set to predict future requests. The system features a self-learning capability and good adaptability to the change of client surfing interest. NewsAgent does not exploit keyword synonymy for conservativeness in prefetching. However, it alleviates the impact of keyword polysemy by taking into account server-provided categorical information in decision-making and, hence, captures more semantic knowledge than term-document literal matching methods. Experimental results from daily browsing of ABC News, CNN, and MSNBC news sites for a period of three months show an achievement of up to 60 percent hit ratio due to prefetching.
  • Keywords
    Internet; Web sites; learning (artificial intelligence); neural nets; storage management; ABC News; CNN; IMSNBC news sites; Internet news services; NewsAgent; URL anchor text; average Web access latency; client surfing interest; client-side personalized prefetching system; future request prediction; keyword polysemy; keyword-based semantic prefetching; neural network model; past retrieved Web documents; personalized news service; self-learning capability; semantic knowledge; semantic preferences; server-provided categorical information; Decision making; Delay; History; Hypertext systems; Network servers; Neural networks; Predictive models; Prefetching; Uniform resource locators; Web and internet services;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.1277820
  • Filename
    1277820