Title :
Transformation-based Framework for Record Matching
Author :
Arasu, Arvind ; Chaudhuri, Surajit ; Kaushik, Raghav
Author_Institution :
Microsoft Res., Redmond, WA
Abstract :
Today\´s record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.
Keywords :
data analysis; data warehouses; data warehouse; record matching; transformation-based framework; Data analysis; Data warehouses; Marketing and sales; Proposals; Standardization;
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
DOI :
10.1109/ICDE.2008.4497412