Title :
A Scalable Data Stream Mining Methodology: Stream-Based Holistic Analytics and Reasoning in Parallel
Author :
Fong, Simon ; Yan Zhuang ; Wong, Raymond ; Mohammed, Sabah
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
Abstract :
Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining or high-speed data analytics in a broad sense. In this paper, a novel data stream mining methodology, called Stream-based Holistic Analytics and Reasoning in Parallel (SHARP) is proposed. SHARP is based on principles of incremental learning which span across a typical data-mining model construction process, from lightweight feature selection, one-pass incremental decision tree induction, and incremental swarm optimization. Each one of these components in SHARP is designed to function together aiming at improving the classification/prediction performance to its best possible. SHARP is scalable, that depends on the available computing resources during runtime, the components can execute in parallel, collectively enhancing different aspects of the overall SHARP process for mining data streams. It is believed that if Big Data are being mined by incrementally learning a data mining model, one pass at a time on the fly, the large volume of such big data is no longer a technical issue, from the perspective of data analytics. Three computer simulation experimentations are shown in this paper, pertaining to three components of SHARP, for demonstrating its efficacy.
Keywords :
Big Data; data analysis; data mining; decision trees; optimisation; Big Data; SHARP; data-mining model construction process; incremental learning; incremental swarm optimization; lightweight feature selection; one-pass incremental decision tree induction; scalable data stream mining methodology; stream-based holistic analytics and reasoning in parallel; Accuracy; Big data; Classification algorithms; Data mining; Data models; Decision trees; Integrated circuits; CCV feature selection; Cache-based data stream classifier; Data stream mining methodology; Meta-heusristics;
Conference_Titel :
Computational and Business Intelligence (ISCBI), 2014 2nd International Symposium on
Print_ISBN :
978-1-4799-7551-8
DOI :
10.1109/ISCBI.2014.31