• DocumentCode
    3151411
  • Title

    An Algorithm for Detecting Similar Data in Replicated Databases Using Multi Criteria Decision Making

  • Author

    Sorkhabi, Vahideh Baradaran ; Derakhshi, M.-R.F. ; Shahamfar, Hadi

  • Author_Institution
    Dept. of Comput. Eng., Azad Univ. Shabestar Branch, Tabriz, Iran
  • fYear
    2009
  • fDate
    28-30 Dec. 2009
  • Firstpage
    199
  • Lastpage
    203
  • Abstract
    Identical data may cause many problems in all types of databases, specially distributed and replicated databases. These data will attack consistency and redundancy which are two important problems in databases. Databases or replicas may contain similar records with different appearance, concerning the same real word entity because of many reasons. Some of these reasons are: Entry errors, unstandardized abbreviations, differences details of various databases schemas, package lost, noisy environments and etc are some reasons of duplicates. This paper proposes an approach to detect duplicate or similar data, which are faulty or noisy so they are distinguished as different data, among various replicas in distributed or replicated databases. Multi criteria decision making algorithm is employed for this propose. To detect identical records, at first step some priorities are defined for fields and then percent of similarity of records evaluate. Algorithm´s time overhead is improved through using special order of priorities. Multi criteria decision making algorithm is used to decide how to combine records with each other and which record is complete and true one. An instance based learning approach is employed to learn how to set priorities for various fields, creating a uniform schema and find their appropriate match, in other replica.
  • Keywords
    database management systems; decision making; operations research; databases schemas; instance based learning approach; multicriteria decision making algorithm; similar data detection; Computer science; Data engineering; Decision making; Delay; Distributed computing; Distributed databases; Mathematics; Redundancy; Scalability; Working environment noise; Replicated database; distributed database; instance based learning; similar data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Environmental and Computer Science, 2009. ICECS '09. Second International Conference on
  • Conference_Location
    Dubai
  • Print_ISBN
    978-0-7695-3937-9
  • Electronic_ISBN
    978-1-4244-5591-1
  • Type

    conf

  • DOI
    10.1109/ICECS.2009.71
  • Filename
    5383525