• DocumentCode
    1906148
  • Title

    An efficient approach for data-duplication detection based on RDBMS

  • Author

    Chanhom, Kiettisak ; Natwichai, Juggapong

  • Author_Institution
    Comput. Eng. Dept., Chiang Mai Univ., Chiang Mai, Thailand
  • fYear
    2011
  • fDate
    11-13 May 2011
  • Firstpage
    325
  • Lastpage
    330
  • Abstract
    Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.
  • Keywords
    data handling; relational databases; RDBMS; TPC-H dataset; data duplication detection; information system management; quality of service; query optimizer; relational database management system; RDBMS; detection; duplication; efficiency; query optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering (JCSSE), 2011 Eighth International Joint Conference on
  • Conference_Location
    Nakhon Pathom
  • Print_ISBN
    978-1-4577-0686-8
  • Type

    conf

  • DOI
    10.1109/JCSSE.2011.5930142
  • Filename
    5930142