Title :
The anatomy of web crawlers
Author :
Sharma, Shruti ; Gupta, Parul
Author_Institution :
Deptt of Comput. Eng., YMCA Univ. of Sci. & Technol., Faridabad, India
Abstract :
World Wide Web (www) is the gigantic and richest source of information. To retrieve the information from this imperative resource, Search Engines are generally used. For this purpose these Search engines rely on massive collections of web pages that have been downloaded by web crawlers. A Web crawler is a program that traverses the web by following the ever changing, dense and distributed hyperlinked structure and thereafter storing downloaded pages in a large repository which is later indexed for efficient execution of user queries. Thus, web crawlers are becoming increasingly important. Various web crawling architectures have been proposed in recent years. In this paper a survey of different architectures of web crawlers along with their comparisons has been carried out that takes into account various important features like scalability, manageability, page refresh policy, politeness policy etc.
Keywords :
Internet; Web sites; query processing; search engines; WWW; Web crawler anatomy; Web crawling architectures; Web pages; World Wide Web; distributed hyperlinked structure; imperative resource; page refresh policy; politeness policy; search engines; user queries; Automation; Computer architecture; Crawlers; Search engines; Uniform resource locators; Web pages; World Wide Web; Focused Crawler; Hidden Web Crawler; Parallel Crawler; Web crawler;
Conference_Titel :
Computing, Communication & Automation (ICCCA), 2015 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-8889-1
DOI :
10.1109/CCAA.2015.7148493