Title :
Parallel k-modes algorithm based on MapReduce
Author :
Guo Tao ; Ding Xiangwu ; Li Yefeng
Author_Institution :
Coll. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
Abstract :
K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using MapReduce parallel computing model. Experiments show that, parallel k-modes archives good speedup ratio when dealing with large-scale categorical data.
Keywords :
parallel processing; pattern clustering; Hadoop; MapReduce parallel computing model; attribute item; categorical clustering algorithm; large-scale categorical data; parallel k-modes algorithm; speedup ratio; Clustering algorithms; Computational modeling; Computers; Data models; Educational institutions; Parallel processing; Servers; MapReduce; categorical data; k-modes; parallel clustering;
Conference_Titel :
Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4799-6375-1
DOI :
10.1109/DINWC.2015.7054238