Title :
A Generalized Fast Subset Sums Framework for Bayesian Event Detection
Author :
Shao, Kan ; Liu, Yandong ; Neill, Daniel B.
Abstract :
We present Generalized Fast Subset Sums (GFSS), a new Bayesian framework for scalable and accurate detection of irregularly shaped spatial clusters using multiple data streams. GFSS extends the previously proposed Multivariate Bayesian Scan Statistic (MBSS) and Fast Subset Sums (FSS) approaches for detection of emerging events. The detection power of MBSS is primarily limited by computational considerations, which limit it to searching over circular spatial regions. GFSS enables more accurate and timely detection by defining a hierarchical prior over all subsets of the N locations, first selecting a local neighborhood consisting of a center location and its neighbors, and introducing a sparsity parameter p to describe how likely each location in the neighborhood is to be affected. This approach allows us to consider all possible subsets of locations (including irregularly-shaped regions) but also puts higher weight on more compact regions. We demonstrate that MBSS and FSS are both special cases of this general framework (assuming p = 1 and p = 0.5 respectively), but substantially higher detection power can be achieved by choosing an appropriate value of p. Thus we show that the distribution of the sparsity parameter p can be accurately learned from a small number of labeled events. Our evaluation results (on synthetic disease outbreaks injected into real-world hospital data) show that the GFSS method with learned sparsity parameter has higher detection power and spatial accuracy than MBSS and FSS, particularly when the affected region is irregular or elongated. We also show that the learned models can be used for event characterization, accurately distinguishing between two otherwise identical event types based on the sparsity of the affected spatial region.
Keywords :
Bayes methods; data handling; learning (artificial intelligence); set theory; Bayesian event detection; FSS; GFSS; GFSS method; MBSS; data streams; fast subset sum; generalized fast subset sum; irregularly shaped spatial clusters; learned sparsity parameter; multivariate Bayesian scan statistic; real world hospital data; sparsity parameter distribution; synthetic disease outbreaks; Accuracy; Bayesian methods; Diseases; Event detection; Frequency selective surfaces; Training; biosurveillance; event detection; scan statistics;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.11