Title :
Detecting clustering in streams
Author :
Picollelli, Michael ; Boncelet, Charles ; Marvel, Lisa
Author_Institution :
Univ. of Delaware, Newark, DE, USA
Abstract :
We consider an anomaly detection problem. We are interested in whether or not a stream of data contains an unusual number or distribution of positives. Abstractly, the problem can be stated as follows: given a binary string, we wish to determine if the number or distribution of 1´s differs significantly from a known spontaneous rate. Furthermore, we consider the presence of an adversary who may try to distribute the 1´s into `clusters´ to fool our test. We compare tests to detect this type of clustering to a simple test on the number of 1´s, and show that clustered data is significantly easier to detect than i.i.d. data. We show that a test on the sum of the reciprocal run lengths in a binary sequence typically performs as well as the classical Wald-Wolfovitz test, and significantly better in some cases. We also show that if the length of the input stream is small, a simple additive correction term improves the detection rate of this test by a modest 1-2%.
Keywords :
binary sequences; pattern clustering; security of data; statistical analysis; Wald-Wolfovitz test; additive correction term; anomaly detection problem; binary string sequence; data clustering detection rate improvement; data streams; positive distribution; reciprocal run length sum; spontaneous rate; unusual number; Standards; Binary sequences; anomaly detection; clustering;
Conference_Titel :
Information Sciences and Systems (CISS), 2012 46th Annual Conference on
Conference_Location :
Princeton, NJ
Print_ISBN :
978-1-4673-3139-5
Electronic_ISBN :
978-1-4673-3138-8
DOI :
10.1109/CISS.2012.6310747