DocumentCode
1879943
Title
Application of clickstream analysis as Web page importance metric in parallel crawlers
Author
Selamat, Ali ; Ahmadi-Abkenari, Fatemeh
Author_Institution
Intell. Software Eng. Lab., Univ. Teknol. Malaysia, Skudai, Malaysia
Volume
1
fYear
2010
fDate
15-17 June 2010
Firstpage
1
Lastpage
6
Abstract
Employing a parallel crawler as a multi processes crawler causes different issues of concern in comparison to applying a single-process crawler. These issues impact on achieving the results with higher or even the same quality from a parallel crawler in comparison to a centralized one. Existed parallel crawlers´ architectures employ link dependant metrics - such as Backlink count or PageRank - for URL importance determination in order to prioritize the queue of each process. Then the specific number of the most important pages is sent to the index section of the crawler for further processing on their content. Application of metrics with link dependent nature causes considerable overhead on the overall parallel crawler resulted from the link information exchange among different processes. In this paper we propose the application of clickstream analysis as a link independent Web page importance metric in a parallel crawler. Our approach includes proposing an algorithm for a balanced performance of different processes within a parallel crawler which results in the discovery of higher quality pages by the overall parallel crawler with less overhead in comparison to a centralized crawler which employs link dependant metrics of importance.
Keywords
Web sites; information filters; information retrieval; Backlink count; PageRank; URL importance determination; Web page importance metric; clickstream analysis; link dependant metrics; multi processes crawler; parallel crawlers; Crawlers; Equations; Fires; Mathematical model; Measurement; Web pages; Clickstream analysis; Parallel crawlers; Web data management; Web page Importance metrics;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology (ITSim), 2010 International Symposium in
Conference_Location
Kuala Lumpur
ISSN
2155-897
Print_ISBN
978-1-4244-6715-0
Type
conf
DOI
10.1109/ITSIM.2010.5561354
Filename
5561354
Link To Document