DH-TRIE frequent pattern mining on Hadoop using JPA

Author

Yang, Lai ; Shi, Zhongzhi ; Xu, Li D. ; Liang, Fan ; Kirsh, Ilan

Author_Institution

Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China

fYear

2011

fDate

8-10 Nov. 2011

Firstpage

875

Lastpage

878

Abstract

The FPgrowth is a famous frequent pattern´s algorithm in data mining when working with high-dimensional, large-scale data sets. It is also known as great complexity on memory for the recursively processing. In general, FPgrowth cannot handle large-scale data set unless dividing a whole data set into small blocks. Based on Hadoop, the open cloud computing model, a distributed DH-TRIE frequent pattern algorithm using JPA is proposed, which solved the three problems (globalization, random-write and duration). The algorithm is shown good flexibility and scalability by comparisons to mahout project. By applied to a virtualization platform Vega Cloud, the algorithm will be used in far-ranging situations.

Keywords

Java; application program interfaces; cloud computing; data mining; pattern clustering; FPgrowth; Hadoop; JPA; Vega cloud; data mining; distributed DH-TRIE frequent pattern algorithm; duration problem; far-ranging situations; globalization problem; high dimensional large scale data sets; open cloud computing model; random write problem; recursive processing; scalability; virtualization platform; Cloud computing; Data mining; Data models; Indexing; Java; Programming; Cloud computing; Data Mining; FPgrowth; Hadoop; JPA; ORM; virtual machine;

fLanguage

English

Publisher

ieee

Conference_Titel

Granular Computing (GrC), 2011 IEEE International Conference on

Conference_Location

Kaohsiung

Print_ISBN

978-1-4577-0372-0

Type

conf

DOI

10.1109/GRC.2011.6122552

Filename

6122552