Title :
The Design and Implementation of the Crawler-Inar
Author :
Ding, Yu-xin ; Wang, Xiao-long ; Lin, Le-bin ; Zhang, Qi ; Wu, Yong-hui
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
Abstract :
This paper discusses the design and implementation of a Web crawler - Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology, which is under development. This paper describes the architecture of the Web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design, hash algorithm design, we proposed our solution
Keywords :
C++ language; Internet; Linux; search engines; C++; Linux; URL queue design; Web crawler-Inar; asynchronous I/O technology; hash algorithm design; search engine; single-threaded crawler; Algorithm design and analysis; Computer science; Crawlers; Cybernetics; HTML; Machine learning; Paper technology; Search engines; Service oriented architecture; Uniform resource locators; Web pages; Web server; Crawler; asynchronous I/O; single thread; web;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.259171