Title :
The research and implementation of Web Spider in Search Engine
Author :
Huang, Jian ; An, Jun-xiu ; Gan, Jian-hong
Author_Institution :
Coll. of Software Eng., CUIT, Chengdu, China
Abstract :
With the further research for Web Spider in Search Engine, innovative design of the original web page storage format is presented to achieve both savings in physical storage space and increase the efficiency of procedures for the operation of I/O; In order to distinguish different web pages and evaluate the importance of data in webpages, a definition of UrlRank algorithm is presented; Storing data with uniform coding, designing a code conversion algorithm, which convert data with different types to an uniform format; Using parallel multi-threaded Spider technology to design and implement web spider system, which system improves the efficiency of the climbing page, and support the continued climb breakpoint, through the process of testing, we can verify the effectiveness and integrity of the program function.
Keywords :
Internet; Web design; multi-threading; search engines; storage management; I/O operation; UrlRank algorithm; Web page storage format; Web spider; climbing page; code conversion algorithm; innovative design; parallel multithreaded Spider technology; search engine; storage space; uniform coding; Algorithm design and analysis; Encoding; Heuristic algorithms; Protocols; Search engines; Synchronization; Web server; UrlRank algorithm; network spider; parallel technology; search engine; the code conversion algorithm;
Conference_Titel :
Apperceiving Computing and Intelligence Analysis (ICACIA), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-8025-8
DOI :
10.1109/ICACIA.2010.5709893