Title :
Improving Relevance Prediction for Focused Web Crawlers
Author :
Safran, Mejdl S. ; Althagafi, Abdullah ; Che, Dunren
fDate :
May 30 2012-June 1 2012
Abstract :
A key issue in designing a focused Web crawler is how to determine whether an unvisited URL is relevant to the search topic. Effective relevance prediction can help avoid downloading and visiting many irrelevant pages. In this paper, we propose a new learning-based approach to improve relevance prediction in focused Web crawlers. For this study, we chose Naïve Bayesian as the base prediction model, which however can be easily switched to a different prediction model. Experimental result shows that our approach is valid and more efficient than related approaches.
Keywords :
Bayes methods; Web sites; data mining; learning (artificial intelligence); prediction theory; relevance feedback; search engines; Naive Bayesian prediction model; URL; focused Web crawlers; learning-based approach; relevance prediction; search topic; Bayesian methods; Classification algorithms; Crawlers; Prediction algorithms; Search engines; Stock markets; Training; Focused crawler; relevance prediction; web mining;
Conference_Titel :
Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-1536-4
DOI :
10.1109/ICIS.2012.61