DocumentCode :
2664656
Title :
Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model
Author :
Peng Chao ; Wu Bin ; Deng Chao
Author_Institution :
Beijing Univ. of Posts & Telecommun. BUPT, Beijing, China
fYear :
2012
fDate :
19-20 June 2012
Firstpage :
43
Lastpage :
47
Abstract :
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large datasets on clusters of computers[1]. The term contribution (TC) algorithm is a relatively new algorithm in text mining to select features for clustering. In this paper, we design and implement a parallel term contribution (PTC) algorithm based on MapReduce model. By experiment, we come to the conclusion that the performance of TC is greatly enhanced using MapReduce framework.
Keywords :
data mining; parallel algorithms; pattern clustering; text analysis; Mapreduce model; PTC algorithm; clustering; computer cluster; distributed computing; parallel term contribution algorithm design; software framework; text mining; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data models; Software algorithms; Text mining; Vectors; Feature Selection; Hadoop; MapReduce; Term Contribution Algorithm; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Open Cirrus Summit (OCS), 2012 Seventh
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/OCS.2012.39
Filename :
6695839
Link To Document :
بازگشت