Title :
Estimation of Optimal Topic Spider Strategy by Use of Decision Trees
Author_Institution :
Xiamen Univ., Xiamen
fDate :
May 30 2007-June 1 2007
Abstract :
The design of a good topic spider entails an optimal strategy for prioritizing the unvisited URLs. This paper uses a decision tree on anchor texts of hyperlinks to determine the prioritization. A novel taxonomy based topic relevance computation function, which embeds machine learning, classifies pages. Evaluation on different data sets shows that the proposed approach leads to promising results.
Keywords :
classification; decision trees; learning (artificial intelligence); relevance feedback; search engines; vocabulary; Web crawling; Web page classification; decision tree; machine learning; optimal topic spider strategy estimation; search engine; taxonomy based topic relevance computation function; Automatic control; Crawlers; Decision trees; Design automation; Machine learning; Optimal control; Taxonomy; Uniform resource locators; Vocabulary; Web pages; decision tree; machine learning; optimal estimation; topic spider;
Conference_Titel :
Control and Automation, 2007. ICCA 2007. IEEE International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4244-0818-4
Electronic_ISBN :
978-1-4244-0818-4
DOI :
10.1109/ICCA.2007.4376873