DocumentCode :
3717128
Title :
Data streaming algorithms for the Kolmogorov-Smirnov test
Author :
Ashwin Lall
Author_Institution :
Dept. of Math. &
fYear :
2015
Firstpage :
95
Lastpage :
104
Abstract :
We propose space-efficient algorithms for performing the Kolmogorov-Smirnov test on streaming data. The Kolmogorov-Smirnov test is a non-parametric test for measuring the strength of a hypothesis that some data is drawn from a fixed distribution (one-sample test), or that two sets of data are drawn from the same distribution (two-sample test). Unlike some other tests, Kolmogorov-Smirnov does not assume that the distribution has a known form (e.g., it is normal), and in the two-sample case it need not know anything about the distribution, other than that it is continuous. Motivated by the challenges of big data, we present algorithms for both the one-sample and the two-sample tests for data processed in a stream. We demonstrate the accuracy of our algorithms via extensive experimentation on both real and synthetic datasets. We show that our algorithms are superior to sampling and that they accurately perform the test with several orders of magnitude reduction in data.
Keywords :
"Extraterrestrial measurements","Distribution functions","Big data","Internet","Green products","Computational modeling","Standards"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363746
Filename :
7363746
Link To Document :
بازگشت