Title :
Classification & detection of near duplicate web pages using five stage algorithm
Author_Institution :
Department of Computer Science & Engineering, Cochin College of engineering & Technology, Valanchery, Malapuram
Abstract :
In the recent years there is a massive development in the web pages, there are billions of web pages existing in the search engine which decreases the efficiency and effectiveness of the search results of the search engine. The existing web pages can be duplicated web pages or near duplicate web pages. In this paper, we are going to deal about the classification of duplicate web pages. In this paper, we are proposing a five stage algorithm for the detection of near duplicate web pages, which include pre-processing, minimum weighting, filtering and verification and classification of the web page using apirori algorithm.
Keywords :
"Web pages","Filtering","Search engines","Classification algorithms","Feature extraction","Algorithm design and analysis"
Conference_Titel :
Green Engineering and Technologies (IC-GET), 2015 Online International Conference on
DOI :
10.1109/GET.2015.7453837