Title :
Application of VM-Based Computations to Speed Up the Web Crawling Process on Multi-core Processors
Author :
Al-Bahadili, Hussein ; Qtishat, Hamzah
Author_Institution :
Fac. of Inf. Technol., Univ. of Petra, Amman, Jordan
Abstract :
A Web crawler is an important component of the Web search engine. It demands large amount of hardware resources to crawl data from the rapidly growing and changing Web. The crawling process should be performed continuously to maintain up-to-date data. This paper develops a new approach to speed up the crawling process on a multi-core processor by utilizing the concept of virtualization. In this approach, the multi-core processor is divided into a number of virtual-machines (VMs), which can concurrently perform different crawling tasks on different initial data. It presents a description, implementation, and evaluation of a VM-based distributed Web crawler. The speedup factor achieved by the VM-based crawler over no virtualization crawler, for crawling various numbers of documents, is estimated. Also, the effect of number of VMs on the speedup factor is investigated.
Keywords :
Internet; multiprocessing systems; search engines; virtual machines; virtualisation; VM-based computations; VM-based crawler; VM-based distributed Web crawler; Web crawling process; Web search engine; hardware resources; multicore processors; speedup factor; virtual-machines; virtualization crawler; Crawlers; Engines; Hardware; Multicore processing; Virtualization; Web crawler; Web search engine; distributed crawling; multi-core processor; processor-farm methodology; virtual machines; virtualization;
Conference_Titel :
Distributed Computing and Applications to Business, Engineering & Science (DCABES), 2013 12th International Symposium on
Conference_Location :
Kingston upon Thames, Surrey, UK
DOI :
10.1109/DCABES.2013.35