DocumentCode :
3198591
Title :
Distributed PrefixSpan algorithm based on MapReduce
Author :
Yong-qing Wei ; Dong Liu ; Lin-shan Duan
Author_Institution :
Basic Educ. Dept., Shandong Police Coll., Jinan, China
Volume :
2
fYear :
2012
fDate :
3-5 Aug. 2012
Firstpage :
901
Lastpage :
904
Abstract :
For mining sequential patterns on massive data set, the distributed sequential pattern mining algorithm based on MapReduce programming model and PrefixSpan is proposed. Mining tasks are decomposed to many small tasks, the Map function is used to mine each Prefix-Projected sequential pattern, and the projected databases were constructed parallelly. It simplifies the search space and acquires a higher mining efficiency. Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values. Both theoretical analyses and experimental results show MR-PrefixSpan reduces the time of scanning database. It solves the problem of mining massive data effectively, has considerable speedup and scaleup performances with an increasing number of processors on the Hadoop platform.
Keywords :
data mining; parallel databases; Hadoop platform; MR-PrefixSpan; Map function; MapReduce programming model; distributed PrefixSpan algorithm; distributed sequential pattern mining algorithm; intermediate values; massive data set; prefix-projected sequential pattern mining; projected parallel databases; reduce function; search space; Indium phosphide; Hadoop platform; MapReduce model; PrefixSpan algorithm; cloud computing; parallel dispose; sequential pattern;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology in Medicine and Education (ITME), 2012 International Symposium on
Conference_Location :
Hokodate, Hokkaido
Print_ISBN :
978-1-4673-2109-9
Type :
conf
DOI :
10.1109/ITiME.2012.6291449
Filename :
6291449
Link To Document :
بازگشت