Title :
Data stream analytics and mining in the cloud
Author :
Ari, I. ; Olmezogullari, E. ; Celebi, O.F.
Author_Institution :
Avea Labs. Comput. Sci. Dept., Ozyeayin Univ. Istanbul, Istanbul, Turkey
Abstract :
Due to prevalent use of sensors and network monitoring tools, big volumes of data or “big data” today traverse the enterprise data processing pipelines in a streaming fashion. While some companies prefer to deploy their data processing infrastructures and services as private clouds, others completely outsource these services to public clouds. In either case, attempting to store the data first for subsequent analysis creates additional resource costs and unwanted delays in obtaining actionable information. As a result, enterprises increasingly employ data or event stream processing systems and further want to extend them with complex online analytic and mining capabilities. In this paper, we present implementation details for doing both correlation analysis and association rule mining (ARM) over streams. Specifically, we implement Pearson-Product Moment Correlation for analytics and Apriori & FPGrowth algorithms for stream mining inside a popular event stream processing engine called Esper. As a unique contribution, we conduct experiments and present performance results of these new tools with different tumbling and sliding time-windows over two different stream types: one for moving bus trajectories and another for web logs from a music site. We find that while tumbling windows may be more preferable for performance in certain applications, sliding windows can provide additional benefits with rule mining. We hope that our findings can shed light on the design of other cloud analytics systems.
Keywords :
cloud computing; computerised monitoring; correlation theory; data mining; enterprise resource planning; pipeline processing; sensors; service-oriented architecture; ARM; Apriori & FPGrowth algorithms; Esper; Pearson-product moment correlation; Web logs; actionable information; association rule mining; big data; cloud analytics systems; cloud computing; correlation analysis; data stream analytics; data stream mining; event stream processing systems; moving bus trajectories; network monitoring tools; pipeline enterprise data processing; popular event stream processing engine; public clouds; resource costs; rule mining; sensors; services as private clouds; sliding time-windows; tumbling time-windows; Association rules; Cloud computing; Correlation; Engines; Itemsets; Real-time systems; Apriori; Association Rule Mining; Complex Event Processing; Correlation; Data streams; FP-growth; Stream mining;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4511-8
Electronic_ISBN :
978-1-4673-4509-5
DOI :
10.1109/CloudCom.2012.6427563