DocumentCode
259681
Title
Towards efficient and scalable data mining using spark
Author
Jie Deng ; Zhiguo Qu ; Yongxu Zhu ; Muntean, Gabriel-Miro ; Xiaojun Wang
Author_Institution
The Rince Institute, Dublin City University, Ireland
fYear
2014
fDate
15-17 May 2014
Firstpage
1
Lastpage
6
Abstract
Following the requirements of discovery of valuable information from data increasing rapidly, data mining technologies have drawn people´s attention for the last decade. However, the big data era makes even higher demands from the data mmmg technologies in terms of both processing speed and data amounts. Any data mmmg algorithm itself can hardly meet these requirements towards effective processing of big data, so distributed systems are proposed to be used. In this paper, a novel method of integrating a sequential pattern mmmg algorithm with a fast large-scale data processing engine Spark is proposed to mine patterns in big data. We use the well-known algorithm PrefixSpan as an example to demonstrate how this method helps handle massive data rapidly and conveniently. The experiments show that this method can make full use of cluster computing resources to accelerate the mmmg process, with a better performance than
Keywords
Distributed system; MapReduce model; PrefixSpan algorithm; Spark platform;
fLanguage
English
Publisher
iet
Conference_Titel
Information and Communications Technologies (ICT 2014), 2014 International Conference on
Conference_Location
Nanjing, China
Type
conf
DOI
10.1049/cp.2014.0616
Filename
6913669
Link To Document