• DocumentCode
    3309812
  • Title

    ASPDD: An Efficient and Scalable Framework for Duplication Detection

  • Author

    Latha, K. ; Rajmohan, B. ; Rajaram, R.

  • Author_Institution
    Comput. Sci. & Eng. Dept., Anna Univ. Tiruchirappalli, Tiruchirappalli, India
  • fYear
    2010
  • fDate
    20-21 June 2010
  • Firstpage
    153
  • Lastpage
    157
  • Abstract
    This paper introduces a framework for duplicate document detection problem that uses an efficient dynamic program called All Pairs Shortest Path in the text collection. Our goal in this work is to investigate the phenomenon and determine the approach that minimizes the impact of duplicates on search results. We show that our approach scales in terms of the number of documents and works well for documents of all domains. We compared our solution to the state of the art and found that our method has produced promising results in addition to improved accuracy of exact duplicate detection, it has also detected partial and neighbor replica. The robustness of the above techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.
  • Keywords
    Character generation; Computer science; Costs; Degradation; Educational institutions; Fingerprint recognition; Information technology; Robustness; Sorting; Wildlife; All Pairs Shortest Path; Degradation Effects; Duplicate Document Detection; Neighbor Replica; Partial Replica;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computer Engineering (ACE), 2010 International Conference on
  • Conference_Location
    Bangalore, Karnataka, India
  • Print_ISBN
    978-1-4244-7154-6
  • Type

    conf

  • DOI
    10.1109/ACE.2010.61
  • Filename
    5532856