DocumentCode :
480262
Title :
Notice of Violation of IEEE Publication Principles
Increasing Search Engine Efficiency Using Cooperative Web
Author :
Choudhari, Rahul ; Choudhari, R.D. ; Choudhari, Ajay
Author_Institution :
Indian Inst. of Inf. Technol. & Manage., Gwalior
Volume :
4
fYear :
2008
fDate :
12-14 Dec. 2008
Firstpage :
1040
Lastpage :
1044
Abstract :
Notice of Violation of IEEE Publication Principles

"Increasing Search Engine Efficiency using Cooperative Web"
by Rahul Choudhari, Ajay Choudhari, R. D. Choudhari
in the Proceedings of the 2008 International Conference on Computer Science and Software Engineering (CSSE 2008), Wuhan, China, December 12, 2008

After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.

This paper contains significant portions of original text from the paper cited below. The original text was copied without attribution (including appropriate references to the original author(s) and/or paper title) and without permission.

"Towards a Content-Provider-Friendly Web Page Crawler"
by Jie Xu, Qinglan Li, Huiming Qu, Alexandros Labrinidis,
in Proceedings of the 10th International Workshop on Web and Databases (WebDB 2007), Beijing, China, June 15, 2007

The performance of the search engine is mainly dependent on freshness of search enginepsilas index which maintains web content in the repository. The other is quality of the ranking algorithm or matching algorithm. The earlier factor is never ending quest because the content of the Web keep up changing after a particular time. Web crawler crawl Web pages and refreshes the index for search engine. To keep the freshness of the result by the search engine, crawling of the Web page should be fundamentally linked with the frequency updates of the Web pages. But the size of Web today and the inherent resource constraints: re-crawling too frequently leads to wasted bandwidth and re-crawling infrequently leads to the poor performance of the search engine. In this paper, we address the scheduling problem and a solution for the Web crawlers, with the objective of the optimizing the resources like freshness of repository and the quality of the index. Towards this we divi- ded the Web content providers into two parts: 1) active; 2) inactive. For inactive content providers we use agents who continuously crawls the content providers and collect the update pattern of the content providers. We also propose a scheduling scheme which capitalizes on the information given by the agents. Extensive experiments with real web traces demonstrate that it plays major role in improving the content quality of the index.
Keywords :
Internet; groupware; scheduling; search engines; Web crawler; Web pages; cooperative Web; matching algorithm; ranking algorithm; scheduling problem; search engine; content providers; crawlers; freshness of repository; quality of index; scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Hubei
Print_ISBN :
978-0-7695-3336-0
Type :
conf
DOI :
10.1109/CSSE.2008.998
Filename :
4722797
Link To Document :
بازگشت