Notice of Violation of IEEE Publication Principles
Increasing Search Engine Efficiency Using Cooperative Web

Author

Choudhari, Rahul ; Choudhari, R.D. ; Choudhari, Ajay

Author_Institution

Indian Inst. of Inf. Technol. & Manage., Gwalior

Volume

4

fYear

2008

fDate

12-14 Dec. 2008

Firstpage

1040

Lastpage

1044

Abstract

Notice of Violation of IEEE Publication Principles

"Increasing Search Engine Efficiency using Cooperative Web"
by Rahul Choudhari, Ajay Choudhari, R. D. Choudhari
in the Proceedings of the 2008 International Conference on Computer Science and Software Engineering (CSSE 2008), Wuhan, China, December 12, 2008

After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.

This paper contains significant portions of original text from the paper cited below. The original text was copied without attribution (including appropriate references to the original author(s) and/or paper title) and without permission.

"Towards a Content-Provider-Friendly Web Page Crawler"
by Jie Xu, Qinglan Li, Huiming Qu, Alexandros Labrinidis,
in Proceedings of the 10th International Workshop on Web and Databases (WebDB 2007), Beijing, China, June 15, 2007

The performance of the search engine is mainly dependent on freshness of search enginepsilas index which maintains web content in the repository. The other is quality of the ranking algorithm or matching algorithm. The earlier factor is never ending quest because the content of the Web keep up changing after a particular time. Web crawler crawl Web pages and refreshes the index for search engine. To keep the freshness of the result by the search engine, crawling of the Web page should be fundamentally linked with the frequency updates of the Web pages. But the size of Web today and the inherent resource constraints: re-crawling too frequently leads to wasted bandwidth and re-crawling infrequently leads to the poor performance of the search engine. In this paper, we address the scheduling problem and a solution for the Web crawlers, with the objective of the optimizing the resources like freshness of repository and the quality of the index. Towards this we divi- ded the Web content providers into two parts: 1) active; 2) inactive. For inactive content providers we use agents who continuously crawls the content providers and collect the update pattern of the content providers. We also propose a scheduling scheme which capitalizes on the information given by the agents. Extensive experiments with real web traces demonstrate that it plays major role in improving the content quality of the index.

Keywords

Internet; groupware; scheduling; search engines; Web crawler; Web pages; cooperative Web; matching algorithm; ranking algorithm; scheduling problem; search engine; content providers; crawlers; freshness of repository; quality of index; scheduling;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Software Engineering, 2008 International Conference on

Conference_Location

Hubei

Print_ISBN

978-0-7695-3336-0

Type

conf

DOI

10.1109/CSSE.2008.998

Filename

4722797

Notice of Violation of IEEE Publication PrinciplesIncreasing Search Engine Efficiency Using Cooperative Web

Choudhari, Rahul ; Choudhari, R.D. ; Choudhari, Ajay

conf

Notice of Violation of IEEE Publication Principles
Increasing Search Engine Efficiency Using Cooperative Web