• DocumentCode
    1879943
  • Title

    Application of clickstream analysis as Web page importance metric in parallel crawlers

  • Author

    Selamat, Ali ; Ahmadi-Abkenari, Fatemeh

  • Author_Institution
    Intell. Software Eng. Lab., Univ. Teknol. Malaysia, Skudai, Malaysia
  • Volume
    1
  • fYear
    2010
  • fDate
    15-17 June 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Employing a parallel crawler as a multi processes crawler causes different issues of concern in comparison to applying a single-process crawler. These issues impact on achieving the results with higher or even the same quality from a parallel crawler in comparison to a centralized one. Existed parallel crawlers´ architectures employ link dependant metrics - such as Backlink count or PageRank - for URL importance determination in order to prioritize the queue of each process. Then the specific number of the most important pages is sent to the index section of the crawler for further processing on their content. Application of metrics with link dependent nature causes considerable overhead on the overall parallel crawler resulted from the link information exchange among different processes. In this paper we propose the application of clickstream analysis as a link independent Web page importance metric in a parallel crawler. Our approach includes proposing an algorithm for a balanced performance of different processes within a parallel crawler which results in the discovery of higher quality pages by the overall parallel crawler with less overhead in comparison to a centralized crawler which employs link dependant metrics of importance.
  • Keywords
    Web sites; information filters; information retrieval; Backlink count; PageRank; URL importance determination; Web page importance metric; clickstream analysis; link dependant metrics; multi processes crawler; parallel crawlers; Crawlers; Equations; Fires; Mathematical model; Measurement; Web pages; Clickstream analysis; Parallel crawlers; Web data management; Web page Importance metrics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology (ITSim), 2010 International Symposium in
  • Conference_Location
    Kuala Lumpur
  • ISSN
    2155-897
  • Print_ISBN
    978-1-4244-6715-0
  • Type

    conf

  • DOI
    10.1109/ITSIM.2010.5561354
  • Filename
    5561354