• DocumentCode
    1961870
  • Title

    Data redundancy and duplicate detection in spatial join processing

  • Author

    Dittrich, Jens-Peter ; Seeger, Bernhard

  • Author_Institution
    Dept. of Math. & Comput. Sci., Marburg Univ., Germany
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    535
  • Lastpage
    546
  • Abstract
    The partition-based spatial-merge join (PBSM) of J.M. Patel and D.J. DeWitt (1996) and the size separation spatial join (S3J) of N. Koudas and K.C. Sevcik (1997) are considered to be among the most efficient methods for processing spatial (intersection) joins on two or more spatial relations. Neither method assumes the presence of pre-existing spatial indices on the relations. In this paper, we propose several improvements to these join algorithms. In particular, we deal with the impact of data redundancy and duplicate detection on the performance of these methods. For PBSM, we present a simple and inexpensive online method to detect duplicates in the response set. There is no longer any need to eliminate duplicates in a final sorting phase, as was originally suggested. We also investigate the impact of different internal algorithms on the total run-time of PBSM. For S3 J, we break with the original design goal and introduce controlled redundancy of data objects. Results of a large set of experiments with real data sets reveal that our suggested modifications to PBSM and S3J result in substantial performance improvements, where PBSM is generally superior to S3J
  • Keywords
    data reduction; merging; query processing; redundancy; software performance evaluation; visual databases; controlled data object redundancy; data redundancy; duplicate detection; internal algorithms; intersection joins; online method; partition-based spatial-merge join; performance; response set; run-time; size separation spatial join; sorting phase; spatial join processing; spatial relations; Availability; Computer science; Costs; Filters; Lead; Mathematics; Multidimensional systems; Partitioning algorithms; Runtime; Sorting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2000. Proceedings. 16th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0506-6
  • Type

    conf

  • DOI
    10.1109/ICDE.2000.839452
  • Filename
    839452