DocumentCode
2079797
Title
Data cleansing as a transient service
Author
Faruquie, Tanveer A. ; Prasad, K. Hima ; Subramaniam, L. Venkata ; Mohania, Mukesh ; Venkatachaliah, Girish ; Kulkarni, Shrinivas ; Basu, Pramit
Author_Institution
IBM India Res. Lab., New Delhi, India
fYear
2010
fDate
1-6 March 2010
Firstpage
1025
Lastpage
1036
Abstract
There is often a transient need within enterprises for data cleansing which can be satisfied by offering data cleansing as a transient service. Every time a data cleansing need arises it should be possible to provision hardware, software and staff for accomplishing the task and then dismantling the set up. In this paper we present such a system that uses virtualized hardware and software for data cleansing. We share actual experiences gained from building such a system.We use a cloud infrastructure to offer virtualized data cleansing instances that can be accessed as a service. We build a system that is scalable, elastic and configurable. Each enterprise has unique needs which makes it necessary to customize both the infrastructure and the cleansing algorithms to address these needs. In this paper we will present a system that is easily configurable to suit the data cleansing needs of an enterprise.
Keywords
Internet; data mining; cleansing algorithms; cloud infrastructure; data cleansing; transient service; virtualized data cleansing; virtualized hardware; virtualized software; Clouds; Costs; Customer service; Databases; Decision making; Delay; Error analysis; Hardware; Investments; Software maintenance;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location
Long Beach, CA
Print_ISBN
978-1-4244-5445-7
Electronic_ISBN
978-1-4244-5444-0
Type
conf
DOI
10.1109/ICDE.2010.5447789
Filename
5447789
Link To Document