DocumentCode
3768210
Title
Classification & detection of near duplicate web pages using five stage algorithm
Author
Eldhose P Sim
Author_Institution
Department of Computer Science & Engineering, Cochin College of engineering & Technology, Valanchery, Malapuram
fYear
2015
Firstpage
1
Lastpage
5
Abstract
In the recent years there is a massive development in the web pages, there are billions of web pages existing in the search engine which decreases the efficiency and effectiveness of the search results of the search engine. The existing web pages can be duplicated web pages or near duplicate web pages. In this paper, we are going to deal about the classification of duplicate web pages. In this paper, we are proposing a five stage algorithm for the detection of near duplicate web pages, which include pre-processing, minimum weighting, filtering and verification and classification of the web page using apirori algorithm.
Keywords
"Web pages","Filtering","Search engines","Classification algorithms","Feature extraction","Algorithm design and analysis"
Publisher
ieee
Conference_Titel
Green Engineering and Technologies (IC-GET), 2015 Online International Conference on
Type
conf
DOI
10.1109/GET.2015.7453837
Filename
7453837
Link To Document