Towards efficient and scalable data mining using spark

Author

Jie Deng ; Zhiguo Qu ; Yongxu Zhu ; Muntean, Gabriel-Miro ; Xiaojun Wang

Author_Institution

The Rince Institute, Dublin City University, Ireland

fYear

2014

fDate

15-17 May 2014

Firstpage

1

Lastpage

6

Abstract

Following the requirements of discovery of valuable information from data increasing rapidly, data mining technologies have drawn people´s attention for the last decade. However, the big data era makes even higher demands from the data mmmg technologies in terms of both processing speed and data amounts. Any data mmmg algorithm itself can hardly meet these requirements towards effective processing of big data, so distributed systems are proposed to be used. In this paper, a novel method of integrating a sequential pattern mmmg algorithm with a fast large-scale data processing engine Spark is proposed to mine patterns in big data. We use the well-known algorithm PrefixSpan as an example to demonstrate how this method helps handle massive data rapidly and conveniently. The experiments show that this method can make full use of cluster computing resources to accelerate the mmmg process, with a better performance than

Keywords

Distributed system; MapReduce model; PrefixSpan algorithm; Spark platform;

fLanguage

English

Publisher

iet

Conference_Titel

Information and Communications Technologies (ICT 2014), 2014 International Conference on

Conference_Location

Nanjing, China

Type

conf

DOI

10.1049/cp.2014.0616

Filename

6913669