Title :
Towards efficient and scalable data mining using spark
Author :
Jie Deng ; Zhiguo Qu ; Yongxu Zhu ; Muntean, Gabriel-Miro ; Xiaojun Wang
Author_Institution :
The Rince Institute, Dublin City University, Ireland
Abstract :
Following the requirements of discovery of valuable information from data increasing rapidly, data mining technologies have drawn people´s attention for the last decade. However, the big data era makes even higher demands from the data mmmg technologies in terms of both processing speed and data amounts. Any data mmmg algorithm itself can hardly meet these requirements towards effective processing of big data, so distributed systems are proposed to be used. In this paper, a novel method of integrating a sequential pattern mmmg algorithm with a fast large-scale data processing engine Spark is proposed to mine patterns in big data. We use the well-known algorithm PrefixSpan as an example to demonstrate how this method helps handle massive data rapidly and conveniently. The experiments show that this method can make full use of cluster computing resources to accelerate the mmmg process, with a better performance than
Keywords :
Distributed system; MapReduce model; PrefixSpan algorithm; Spark platform;
Conference_Titel :
Information and Communications Technologies (ICT 2014), 2014 International Conference on
Conference_Location :
Nanjing, China
DOI :
10.1049/cp.2014.0616