DocumentCode :
2474386
Title :
The research and implementation of Web Spider in Search Engine
Author :
Huang, Jian ; An, Jun-xiu ; Gan, Jian-hong
Author_Institution :
Coll. of Software Eng., CUIT, Chengdu, China
fYear :
2010
fDate :
17-19 Dec. 2010
Firstpage :
244
Lastpage :
247
Abstract :
With the further research for Web Spider in Search Engine, innovative design of the original web page storage format is presented to achieve both savings in physical storage space and increase the efficiency of procedures for the operation of I/O; In order to distinguish different web pages and evaluate the importance of data in webpages, a definition of UrlRank algorithm is presented; Storing data with uniform coding, designing a code conversion algorithm, which convert data with different types to an uniform format; Using parallel multi-threaded Spider technology to design and implement web spider system, which system improves the efficiency of the climbing page, and support the continued climb breakpoint, through the process of testing, we can verify the effectiveness and integrity of the program function.
Keywords :
Internet; Web design; multi-threading; search engines; storage management; I/O operation; UrlRank algorithm; Web page storage format; Web spider; climbing page; code conversion algorithm; innovative design; parallel multithreaded Spider technology; search engine; storage space; uniform coding; Algorithm design and analysis; Encoding; Heuristic algorithms; Protocols; Search engines; Synchronization; Web server; UrlRank algorithm; network spider; parallel technology; search engine; the code conversion algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Apperceiving Computing and Intelligence Analysis (ICACIA), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-8025-8
Type :
conf
DOI :
10.1109/ICACIA.2010.5709893
Filename :
5709893
Link To Document :
بازگشت