DocumentCode
3198591
Title
Distributed PrefixSpan algorithm based on MapReduce
Author
Yong-qing Wei ; Dong Liu ; Lin-shan Duan
Author_Institution
Basic Educ. Dept., Shandong Police Coll., Jinan, China
Volume
2
fYear
2012
fDate
3-5 Aug. 2012
Firstpage
901
Lastpage
904
Abstract
For mining sequential patterns on massive data set, the distributed sequential pattern mining algorithm based on MapReduce programming model and PrefixSpan is proposed. Mining tasks are decomposed to many small tasks, the Map function is used to mine each Prefix-Projected sequential pattern, and the projected databases were constructed parallelly. It simplifies the search space and acquires a higher mining efficiency. Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values. Both theoretical analyses and experimental results show MR-PrefixSpan reduces the time of scanning database. It solves the problem of mining massive data effectively, has considerable speedup and scaleup performances with an increasing number of processors on the Hadoop platform.
Keywords
data mining; parallel databases; Hadoop platform; MR-PrefixSpan; Map function; MapReduce programming model; distributed PrefixSpan algorithm; distributed sequential pattern mining algorithm; intermediate values; massive data set; prefix-projected sequential pattern mining; projected parallel databases; reduce function; search space; Indium phosphide; Hadoop platform; MapReduce model; PrefixSpan algorithm; cloud computing; parallel dispose; sequential pattern;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology in Medicine and Education (ITME), 2012 International Symposium on
Conference_Location
Hokodate, Hokkaido
Print_ISBN
978-1-4673-2109-9
Type
conf
DOI
10.1109/ITiME.2012.6291449
Filename
6291449
Link To Document