Title :
Search-Engine-Oriented Theme Crawler Design
Author_Institution :
Yancheng Inst. of Technol., Yancheng, China
Abstract :
A theme crawler is the most important part of a vertical search engine. To recall web pages efficiently and accurately, the design work of theme crawler was studied in this paper. Seed link and similarity measurement are two key techniques for a theme crawler, which are explained in detail in this paper. And the relevant program codes and algorithm were provided to explain there two techniques clearly. The process of a theme crawler begins from fetching seed links, host search engine, interface of search engine and fetch link were illustrated in the paper. To improve the efficiency of crawler, a model of page evaluation was added to the crawler module.
Keywords :
search engines; page evaluation; program codes; theme crawler; vertical search engine; Arrays; Crawlers; Engines; Google; Search engines; Transforms; Web pages; page evaluation; theme crawler; vertical search engine;
Conference_Titel :
System Science, Engineering Design and Manufacturing Informatization (ICSEM), 2010 International Conference on
Conference_Location :
Yichang
Print_ISBN :
978-1-4244-8664-9
DOI :
10.1109/ICSEM.2010.169