DocumentCode :
1934943
Title :
Parallel k-modes algorithm based on MapReduce
Author :
Guo Tao ; Ding Xiangwu ; Li Yefeng
Author_Institution :
Coll. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
fYear :
2015
fDate :
3-5 Feb. 2015
Firstpage :
176
Lastpage :
179
Abstract :
K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using MapReduce parallel computing model. Experiments show that, parallel k-modes archives good speedup ratio when dealing with large-scale categorical data.
Keywords :
parallel processing; pattern clustering; Hadoop; MapReduce parallel computing model; attribute item; categorical clustering algorithm; large-scale categorical data; parallel k-modes algorithm; speedup ratio; Clustering algorithms; Computational modeling; Computers; Data models; Educational institutions; Parallel processing; Servers; MapReduce; categorical data; k-modes; parallel clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4799-6375-1
Type :
conf
DOI :
10.1109/DINWC.2015.7054238
Filename :
7054238
Link To Document :
بازگشت