• DocumentCode
    3198591
  • Title

    Distributed PrefixSpan algorithm based on MapReduce

  • Author

    Yong-qing Wei ; Dong Liu ; Lin-shan Duan

  • Author_Institution
    Basic Educ. Dept., Shandong Police Coll., Jinan, China
  • Volume
    2
  • fYear
    2012
  • fDate
    3-5 Aug. 2012
  • Firstpage
    901
  • Lastpage
    904
  • Abstract
    For mining sequential patterns on massive data set, the distributed sequential pattern mining algorithm based on MapReduce programming model and PrefixSpan is proposed. Mining tasks are decomposed to many small tasks, the Map function is used to mine each Prefix-Projected sequential pattern, and the projected databases were constructed parallelly. It simplifies the search space and acquires a higher mining efficiency. Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values. Both theoretical analyses and experimental results show MR-PrefixSpan reduces the time of scanning database. It solves the problem of mining massive data effectively, has considerable speedup and scaleup performances with an increasing number of processors on the Hadoop platform.
  • Keywords
    data mining; parallel databases; Hadoop platform; MR-PrefixSpan; Map function; MapReduce programming model; distributed PrefixSpan algorithm; distributed sequential pattern mining algorithm; intermediate values; massive data set; prefix-projected sequential pattern mining; projected parallel databases; reduce function; search space; Indium phosphide; Hadoop platform; MapReduce model; PrefixSpan algorithm; cloud computing; parallel dispose; sequential pattern;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology in Medicine and Education (ITME), 2012 International Symposium on
  • Conference_Location
    Hokodate, Hokkaido
  • Print_ISBN
    978-1-4673-2109-9
  • Type

    conf

  • DOI
    10.1109/ITiME.2012.6291449
  • Filename
    6291449