DocumentCode :
2726026
Title :
The Design and Implementation of a Topic-Driven Crawler
Author :
Li, Qiong ; Jin, Tao ; Fu, Yuchen ; Liu, Quan ; Cui, Zhiming
fYear :
2007
fDate :
2-3 Dec. 2007
Firstpage :
153
Lastpage :
156
Abstract :
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. As a result, topic-driven crawlers are becoming important tools to support applications such as specialized web portals, online searching, and competitive intelligence. This paper presents a topic-driven crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. This paper also gives a kind of comparatively ideal system architecture and the relationship of each module of a topic-driven crawler, and describes several modules on the details.
Keywords :
Application software; Competitive intelligence; Crawlers; Entropy; Frequency; Internet; Search engines; Sorting; Uniform resource locators; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Information Technology Application, Workshop on
Conference_Location :
Zhang Jiajie
Print_ISBN :
978-0-7695-3063-5
Type :
conf
DOI :
10.1109/IITA.2007.33
Filename :
4426987
Link To Document :
بازگشت