DocumentCode
1906148
Title
An efficient approach for data-duplication detection based on RDBMS
Author
Chanhom, Kiettisak ; Natwichai, Juggapong
Author_Institution
Comput. Eng. Dept., Chiang Mai Univ., Chiang Mai, Thailand
fYear
2011
fDate
11-13 May 2011
Firstpage
325
Lastpage
330
Abstract
Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.
Keywords
data handling; relational databases; RDBMS; TPC-H dataset; data duplication detection; information system management; quality of service; query optimizer; relational database management system; RDBMS; detection; duplication; efficiency; query optimization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Software Engineering (JCSSE), 2011 Eighth International Joint Conference on
Conference_Location
Nakhon Pathom
Print_ISBN
978-1-4577-0686-8
Type
conf
DOI
10.1109/JCSSE.2011.5930142
Filename
5930142
Link To Document