DocumentCode :
3717247
Title :
Online anomaly detection over Big Data streams
Author :
Laura Rettig;Mourad Khayati;Philippe Cudr?-Mauroux;Michal Pi?rkowski
Author_Institution :
Big Data and Business Intelligence Competence Center at Swisscom, Bern-Switzerland
fYear :
2015
Firstpage :
1113
Lastpage :
1122
Abstract :
Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-the-art streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies - like infrastructure failures, hardware misconfiguration or user-driven anomalies - in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
Keywords :
"Entropy","Measurement","Correlation","Big data","Data structures","Yttrium","Sparks"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363865
Filename :
7363865
Link To Document :
بازگشت