مرکز منطقه ای اطلاع رساني علوم و فناوري - Reducing inconsistency in integrating data from different sources

DocumentCode :

3305771

Title :

Reducing inconsistency in integrating data from different sources

Author :

Luján-Mora, Sergio ; Palomar, Manuel

Author_Institution :

Dept. de Lenguajes y Sistemas Inf., Alicante Univ., Spain

fYear :

2001

fDate :

2001

Firstpage :

209

Lastpage :

218

Abstract :

One of the main problems in integrating databases into a common repository is the possible inconsistency of the values stored in them, i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. The authors present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could be substituted for the original values. We evaluate four different similarity measures for clustering with and without expansion of abbreviations. The method we propose may work well in practice but it is time-consuming. In order to reduce this problem, we remove stop words for speeding up the clustering

Keywords :

data integrity; database management systems; pattern clustering; string matching; word processing; abbreviation expansion; automatic method; clustered values; common repository; common value; data clustering; data integration; data quality; databases; inconsistency reduction; misspelling; permuted word order; similarity degree; similarity measures; spelling variants; stop words; Cleaning; Data warehouses; Decision making; Information systems; Proposals; Relational databases;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database Engineering and Applications, 2001 International Symposium on.

Conference_Location :

Grenoble

Print_ISBN :

0-7695-1140-6

Type :

conf

DOI :

10.1109/IDEAS.2001.938087

Filename :

938087

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3305771