• DocumentCode
    2079797
  • Title

    Data cleansing as a transient service

  • Author

    Faruquie, Tanveer A. ; Prasad, K. Hima ; Subramaniam, L. Venkata ; Mohania, Mukesh ; Venkatachaliah, Girish ; Kulkarni, Shrinivas ; Basu, Pramit

  • Author_Institution
    IBM India Res. Lab., New Delhi, India
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    1025
  • Lastpage
    1036
  • Abstract
    There is often a transient need within enterprises for data cleansing which can be satisfied by offering data cleansing as a transient service. Every time a data cleansing need arises it should be possible to provision hardware, software and staff for accomplishing the task and then dismantling the set up. In this paper we present such a system that uses virtualized hardware and software for data cleansing. We share actual experiences gained from building such a system.We use a cloud infrastructure to offer virtualized data cleansing instances that can be accessed as a service. We build a system that is scalable, elastic and configurable. Each enterprise has unique needs which makes it necessary to customize both the infrastructure and the cleansing algorithms to address these needs. In this paper we will present a system that is easily configurable to suit the data cleansing needs of an enterprise.
  • Keywords
    Internet; data mining; cleansing algorithms; cloud infrastructure; data cleansing; transient service; virtualized data cleansing; virtualized hardware; virtualized software; Clouds; Costs; Customer service; Databases; Decision making; Delay; Error analysis; Hardware; Investments; Software maintenance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-5445-7
  • Electronic_ISBN
    978-1-4244-5444-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2010.5447789
  • Filename
    5447789