• DocumentCode
    234846
  • Title

    A Parallel Algorithm for Datacleansing in Incomplete Information Systems Using MapReduce

  • Author

    Fei Chen ; Lin Jiang

  • Author_Institution
    Fac. of Sci., Kunming Univ. of Sci. & Technol., Kunming, China
  • fYear
    2014
  • fDate
    15-16 Nov. 2014
  • Firstpage
    273
  • Lastpage
    277
  • Abstract
    Data cleansing is an important process of data mining. It is the key technology for ensuring the quality of the data. Classical data pre-processing technique has limitation in processing massive data with missing information, and sometimes it can not obtain precise and reasonable results, which leads to low-quality data. To this end, through deep analysis of the classical pre-processing, combining with the MapReduce programming model, A parallel algorithm for data cleansing in incomplete information systems using MapReduce is put forward to process the massive data with missing information. Finally, the new algorithm is applied to incomplete decision information system, and the analysis results show that the new algorithm is effective.
  • Keywords
    data handling; information systems; parallel algorithms; parallel programming; MapReduce programming model; data cleansing; data mining; incomplete decision information system; parallel algorithm; Algorithm design and analysis; Cleaning; Data mining; Distributed databases; Information systems; Parallel algorithms; Programming; Data cleansing; MapReduce; incomplete information systems; massive data; rough set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security (CIS), 2014 Tenth International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4799-7433-7
  • Type

    conf

  • DOI
    10.1109/CIS.2014.42
  • Filename
    7016899