• DocumentCode
    2922837
  • Title

    DH-TRIE frequent pattern mining on Hadoop using JPA

  • Author

    Yang, Lai ; Shi, Zhongzhi ; Xu, Li D. ; Liang, Fan ; Kirsh, Ilan

  • Author_Institution
    Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
  • fYear
    2011
  • fDate
    8-10 Nov. 2011
  • Firstpage
    875
  • Lastpage
    878
  • Abstract
    The FPgrowth is a famous frequent pattern´s algorithm in data mining when working with high-dimensional, large-scale data sets. It is also known as great complexity on memory for the recursively processing. In general, FPgrowth cannot handle large-scale data set unless dividing a whole data set into small blocks. Based on Hadoop, the open cloud computing model, a distributed DH-TRIE frequent pattern algorithm using JPA is proposed, which solved the three problems (globalization, random-write and duration). The algorithm is shown good flexibility and scalability by comparisons to mahout project. By applied to a virtualization platform Vega Cloud, the algorithm will be used in far-ranging situations.
  • Keywords
    Java; application program interfaces; cloud computing; data mining; pattern clustering; FPgrowth; Hadoop; JPA; Vega cloud; data mining; distributed DH-TRIE frequent pattern algorithm; duration problem; far-ranging situations; globalization problem; high dimensional large scale data sets; open cloud computing model; random write problem; recursive processing; scalability; virtualization platform; Cloud computing; Data mining; Data models; Indexing; Java; Programming; Cloud computing; Data Mining; FPgrowth; Hadoop; JPA; ORM; virtual machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Granular Computing (GrC), 2011 IEEE International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-1-4577-0372-0
  • Type

    conf

  • DOI
    10.1109/GRC.2011.6122552
  • Filename
    6122552