• DocumentCode
    3768210
  • Title

    Classification & detection of near duplicate web pages using five stage algorithm

  • Author

    Eldhose P Sim

  • Author_Institution
    Department of Computer Science & Engineering, Cochin College of engineering & Technology, Valanchery, Malapuram
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In the recent years there is a massive development in the web pages, there are billions of web pages existing in the search engine which decreases the efficiency and effectiveness of the search results of the search engine. The existing web pages can be duplicated web pages or near duplicate web pages. In this paper, we are going to deal about the classification of duplicate web pages. In this paper, we are proposing a five stage algorithm for the detection of near duplicate web pages, which include pre-processing, minimum weighting, filtering and verification and classification of the web page using apirori algorithm.
  • Keywords
    "Web pages","Filtering","Search engines","Classification algorithms","Feature extraction","Algorithm design and analysis"
  • Publisher
    ieee
  • Conference_Titel
    Green Engineering and Technologies (IC-GET), 2015 Online International Conference on
  • Type

    conf

  • DOI
    10.1109/GET.2015.7453837
  • Filename
    7453837