Title :
Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers
Author :
Gadiraju, N. V G Sirisha ; Chaitanya, R. Krishna ; Raju, G. V Padma
Author_Institution :
Dept. of CSE, S.R.K.R. Eng. Coll., Bhimavaram, India
Abstract :
This paper mainly focuses on the effect of feature selection method on the performance of Traditional Focused Crawler (TFC) and Accelerated Focused Crawler (AFC). Information retrieval methods like querying a search engine, usage of web catalog and browsing may not satisfy the information needs of all the users. When information requirement is about a specific topic, focused crawlers will complement these methods. The aim of these crawlers is to download web pages that are highly relevant to the pre-defined topic. Naive Bayesian classifier is used to guide the crawlers by rating the web page before it is downloaded. For this analysis topics to be crawled are represented using a set of relevant documents. The features used by Bayesian Classifier in construction of the model are collected from the document corpus using Document Frequency and Information Gain feature selection methods. Performance of both the crawlers is evaluated when 500 features are selected using Document Frequency and Information Gain feature selection methods. Accelerated Focused Crawler´s performance is evaluated for varied number of features gathered using both the feature selection methods. Target pages recall and Target description recall are used in evaluating the crawlers.
Keywords :
Bayes methods; Internet; pattern classification; query processing; search engines; Information retrieval methods; Web catalog; Web pages; accelerated focused crawler; document frequency; feature selection method; information gain feature selection methods; naive Bayesian classifier; search engine querying; target description recall; target pages recall; traditional focused crawler; Acceleration; Bayesian methods; Crawlers; Educational institutions; Frequency; Information retrieval; Information technology; Search engines; Taxonomy; Web pages; Accelerated Focused Crawler; Classifier; Feature Selection; Focused Crawler; Performance;
Conference_Titel :
Networking and Information Technology (ICNIT), 2010 International Conference on
Conference_Location :
Manila
Print_ISBN :
978-1-4244-7579-7
Electronic_ISBN :
978-1-4244-7578-0
DOI :
10.1109/ICNIT.2010.5508468