DocumentCode :
2404444
Title :
The BINGO! focused crawler: from bookmarks to archetypes
Author :
Sizov, Sergej ; Siersdorfer, Stefan ; Theobald, Martin ; Weikum, Gerhard
Author_Institution :
Saarlandes Univ., Saarbrucken, Germany
fYear :
2002
fDate :
2002
Firstpage :
337
Lastpage :
338
Abstract :
The BINGO! system implements an approach to focused crawling that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far. Two kinds of archetypes are considered: good authorities as determined by employing Kleinberg\´s link analysis algorithm, and documents that have been automatically classified with high confidence using a linear SVM classifier
Keywords :
classification; hypermedia markup languages; BINGO! focused crawler; Kleinberg link analysis algorithm; archetypes; best URLs; bookmarks; crawl frontier; linear SVM classifier; positively classified documents; re-training; Costs; Crawlers; Humans; Ontologies; Search engines; Support vector machine classification; Support vector machines; Training data; Uniform resource locators; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2002. Proceedings. 18th International Conference on
Conference_Location :
San Jose, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-1531-2
Type :
conf
DOI :
10.1109/ICDE.2002.994746
Filename :
994746
Link To Document :
بازگشت