DocumentCode
3310248
Title
An Architectural Framework of a Crawler for Retrieving Highly Relevant Web Documents by Filtering Replicated Web Collections
Author
Shekhar, Shashi ; Agrawal, Rohit ; Arya, Karm Veer
Author_Institution
GLA Inst. of Technol. & Manage., Mathura, India
fYear
2010
fDate
20-21 June 2010
Firstpage
29
Lastpage
33
Abstract
As the Web continues to grow, it has become a difficult task to search for the relevant information using traditional search engines. There are many index based web search engines to search information in various domains on the Web. By using such search engines the retrieved documents (URLs) related to the searched topic are of poor quality also as the amount of Web pages is growing at a rapid speed, the issue of devising a personalized Web search is of great importance. This paper proposes a method to reduce the time spend on browsing search results by providing a personalized Web Search Agent (MetaCrawler). In the proposed technique of personalized Web searching, Web pages relevant to user interests will be ranked in the front of the result list, thus facilitating the user to get a quick to get access those links ranked in the front of the list. An experiment was designed and conducted to test the performance of proposed Web-Filtering approach. The experimental results suggest substantial improvement in the crawling strategy, especially when the search strings are small.
Keywords
Computer networks; Crawlers; Data mining; Information filtering; Information filters; Intelligent agent; Search engines; Uniform resource locators; Web pages; Web search; Link analysis; Search result ranking; Web IR; Web crawler; Web page classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computer Engineering (ACE), 2010 International Conference on
Conference_Location
Bangalore, Karnataka, India
Print_ISBN
978-1-4244-7154-6
Type
conf
DOI
10.1109/ACE.2010.64
Filename
5532879
Link To Document