Title :
Improved focused crawling using bayesian object based approach
Author :
Ghozia, Ahmed ; Sorour, Hoda ; Aboshosha, Ashraf
Author_Institution :
Comput. Eng. & Sci. Dept., Menofiya Univ., Menouf
Abstract :
The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a specific topic became vital. Building vertical search engines is accomplished by the help of a focused crawler. A focused crawler traverses the Web selecting out relevant pages to a predefined topic and neglecting those out of concern. The focused crawler is guided toward those relevant pages through a crawling strategy. In this paper, a new crawling strategy is presented that helps building a vertical search engine. With this strategy, the crawler is kept focused to the user interests toward the topic. We build a model that describes the Web pages´ features that distinguish relevant Web documents from those that are irrelevant. This is accomplished in the form of a supervised learning process, the Web page is treated as an object having a set of features, and the features´ values determine the relevancy of the Web page through a Bayesian model. Results from practical experiments proved the efficiency of the proposed crawling strategy.
Keywords :
Bayes methods; Internet; learning (artificial intelligence); query processing; search engines; Bayesian object based approach; Web page; World-Wide-Web; focused crawling; information retrieval; supervised learning process; user queries; vertical search engine; Bayesian methods; Crawlers; Genetic algorithms; Global Positioning System; Indexing; Predictive models; Search engines; Supervised learning; Uniform resource locators; Web pages;
Conference_Titel :
Radio Science Conference, 2008. NRSC 2008. National
Conference_Location :
Tanta
Print_ISBN :
978-977-5031-95-2
DOI :
10.1109/NRSC.2008.4542363