DocumentCode :
2900813
Title :
The Design and Implementation of the Crawler-Inar
Author :
Ding, Yu-xin ; Wang, Xiao-long ; Lin, Le-bin ; Zhang, Qi ; Wu, Yong-hui
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
fYear :
2006
fDate :
13-16 Aug. 2006
Firstpage :
4527
Lastpage :
4530
Abstract :
This paper discusses the design and implementation of a Web crawler - Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology, which is under development. This paper describes the architecture of the Web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design, hash algorithm design, we proposed our solution
Keywords :
C++ language; Internet; Linux; search engines; C++; Linux; URL queue design; Web crawler-Inar; asynchronous I/O technology; hash algorithm design; search engine; single-threaded crawler; Algorithm design and analysis; Computer science; Crawlers; Cybernetics; HTML; Machine learning; Paper technology; Search engines; Service oriented architecture; Uniform resource locators; Web pages; Web server; Crawler; asynchronous I/O; single thread; web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
Type :
conf
DOI :
10.1109/ICMLC.2006.259171
Filename :
4028869
Link To Document :
بازگشت