An Architectural Framework of a Crawler for Retrieving Highly Relevant Web Documents by Filtering Replicated Web Collections

Author

Shekhar, Shashi ; Agrawal, Rohit ; Arya, Karm Veer

Author_Institution

GLA Inst. of Technol. & Manage., Mathura, India

fYear

2010

fDate

20-21 June 2010

Firstpage

29

Lastpage

33

Abstract

As the Web continues to grow, it has become a difficult task to search for the relevant information using traditional search engines. There are many index based web search engines to search information in various domains on the Web. By using such search engines the retrieved documents (URLs) related to the searched topic are of poor quality also as the amount of Web pages is growing at a rapid speed, the issue of devising a personalized Web search is of great importance. This paper proposes a method to reduce the time spend on browsing search results by providing a personalized Web Search Agent (MetaCrawler). In the proposed technique of personalized Web searching, Web pages relevant to user interests will be ranked in the front of the result list, thus facilitating the user to get a quick to get access those links ranked in the front of the list. An experiment was designed and conducted to test the performance of proposed Web-Filtering approach. The experimental results suggest substantial improvement in the crawling strategy, especially when the search strings are small.

Keywords

Computer networks; Crawlers; Data mining; Information filtering; Information filters; Intelligent agent; Search engines; Uniform resource locators; Web pages; Web search; Link analysis; Search result ranking; Web IR; Web crawler; Web page classification;

fLanguage

English

Publisher

ieee

Conference_Titel

Advances in Computer Engineering (ACE), 2010 International Conference on

Conference_Location

Bangalore, Karnataka, India

Print_ISBN

978-1-4244-7154-6

Type

conf

DOI

10.1109/ACE.2010.64

Filename

5532879