DocumentCode
3151411
Title
An Algorithm for Detecting Similar Data in Replicated Databases Using Multi Criteria Decision Making
Author
Sorkhabi, Vahideh Baradaran ; Derakhshi, M.-R.F. ; Shahamfar, Hadi
Author_Institution
Dept. of Comput. Eng., Azad Univ. Shabestar Branch, Tabriz, Iran
fYear
2009
fDate
28-30 Dec. 2009
Firstpage
199
Lastpage
203
Abstract
Identical data may cause many problems in all types of databases, specially distributed and replicated databases. These data will attack consistency and redundancy which are two important problems in databases. Databases or replicas may contain similar records with different appearance, concerning the same real word entity because of many reasons. Some of these reasons are: Entry errors, unstandardized abbreviations, differences details of various databases schemas, package lost, noisy environments and etc are some reasons of duplicates. This paper proposes an approach to detect duplicate or similar data, which are faulty or noisy so they are distinguished as different data, among various replicas in distributed or replicated databases. Multi criteria decision making algorithm is employed for this propose. To detect identical records, at first step some priorities are defined for fields and then percent of similarity of records evaluate. Algorithm´s time overhead is improved through using special order of priorities. Multi criteria decision making algorithm is used to decide how to combine records with each other and which record is complete and true one. An instance based learning approach is employed to learn how to set priorities for various fields, creating a uniform schema and find their appropriate match, in other replica.
Keywords
database management systems; decision making; operations research; databases schemas; instance based learning approach; multicriteria decision making algorithm; similar data detection; Computer science; Data engineering; Decision making; Delay; Distributed computing; Distributed databases; Mathematics; Redundancy; Scalability; Working environment noise; Replicated database; distributed database; instance based learning; similar data;
fLanguage
English
Publisher
ieee
Conference_Titel
Environmental and Computer Science, 2009. ICECS '09. Second International Conference on
Conference_Location
Dubai
Print_ISBN
978-0-7695-3937-9
Electronic_ISBN
978-1-4244-5591-1
Type
conf
DOI
10.1109/ICECS.2009.71
Filename
5383525
Link To Document