Title :
Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series
Author :
Yafei Wu;Yongxin Zhu;Tian Huang;Xinyang Li;Xinyi Liu;Mengyun Liu
Author_Institution :
Sch. of Microelectron., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
The computational complexity of discord discovery is O(m2), where m is the size of time series. Many promising methods were proposed to resolve this compute-intensive problem. These methods sequentially discover discords on standalone machine. The limited capability of standalone machine in terms of computing and memory capacity hinders these methods in discovering discords from large dataset in reasonable time. In this work, we propose a distributed discord discovery method. Our method is able to combine discord results from different computing nodes, which are non-combinable in previous literature. We mitigate the issue of the memory wall by using distributed data partitioning. We implement our method on distributed Spark computing framework and distributed HDFS (Hadoop Distributed File System) storage platform. The implementation exhibits superior scalability and enables discords discovery in multi-dimension time series. We evaluate our method with terabyte-sized dataset, which is larger than any dataset in previous literature. Evaluation results show that our method has clear advantage in terms of performance and efficiency over state-of-the-art algorithms.
Keywords :
"Time series analysis","Algorithm design and analysis","Sparks","Microelectronics","Force","Acceleration","Clustering algorithms"
Conference_Titel :
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
DOI :
10.1109/HPCC-CSS-ICESS.2015.228