DocumentCode
234846
Title
A Parallel Algorithm for Datacleansing in Incomplete Information Systems Using MapReduce
Author
Fei Chen ; Lin Jiang
Author_Institution
Fac. of Sci., Kunming Univ. of Sci. & Technol., Kunming, China
fYear
2014
fDate
15-16 Nov. 2014
Firstpage
273
Lastpage
277
Abstract
Data cleansing is an important process of data mining. It is the key technology for ensuring the quality of the data. Classical data pre-processing technique has limitation in processing massive data with missing information, and sometimes it can not obtain precise and reasonable results, which leads to low-quality data. To this end, through deep analysis of the classical pre-processing, combining with the MapReduce programming model, A parallel algorithm for data cleansing in incomplete information systems using MapReduce is put forward to process the massive data with missing information. Finally, the new algorithm is applied to incomplete decision information system, and the analysis results show that the new algorithm is effective.
Keywords
data handling; information systems; parallel algorithms; parallel programming; MapReduce programming model; data cleansing; data mining; incomplete decision information system; parallel algorithm; Algorithm design and analysis; Cleaning; Data mining; Distributed databases; Information systems; Parallel algorithms; Programming; Data cleansing; MapReduce; incomplete information systems; massive data; rough set;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security (CIS), 2014 Tenth International Conference on
Conference_Location
Kunming
Print_ISBN
978-1-4799-7433-7
Type
conf
DOI
10.1109/CIS.2014.42
Filename
7016899
Link To Document