Title :
A clickstream-based web page significance ranking metric for Web crawlers
Author :
Ahmadi-Abkenari, F. ; Selamat, Ali
Author_Institution :
Intell. Software Eng. Lab., Univ. of Technol. of Malaysia, Skudai, Malaysia
Abstract :
The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in demand to provide more accurate search outcomes. Because implementing existed Web page importance metrics either link based or context based within a parallel crawler can not be an absolute solution for the coverage of authorized fresh Web content and the accuracy concerns, so employing these metrics is not the final approach within search engines´ architecture. This paper proposes an analysis on clickstream data in order to discover the popularity of Web pages in crawl frontier through proposing the metric itself and presenting the experimental results on ranking the UTM Web pages based on the proposed discussed metric.
Keywords :
Web sites; authorisation; search engines; UTM Web pages; Web crawlers; Web page importance metrics; World Wide Web; authorized fresh Web content; clickstream data; clickstream-based Web page significance ranking metric; crawl frontier; parallel crawler; search engines architecture; Context; Crawlers; Educational institutions; Measurement; Search engines; Web pages; Clickstream analysis; Search engines; Web crawlers; Web data management; Web page importance metric;
Conference_Titel :
Software Engineering (MySEC), 2011 5th Malaysian Conference in
Conference_Location :
Johor Bahru
Print_ISBN :
978-1-4577-1530-3
DOI :
10.1109/MySEC.2011.6140674