Title :
Fast Anomaly Detection in Dynamic Clinical Datasets Using Near-Optimal Hashing with Concentric Expansions
Author :
Syed, Zeeshan ; Rubinfeld, Ilan
Author_Institution :
Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can often be drastic. Unfortunately, developing risk stratification algorithms for these conditions typically requires collecting large volumes of data to capture enough positive and negative cases for training. This process is slow, expensive, and often burdensome to both patients and caregivers. In this paper, we propose an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of our approach is to identify high risk patients as anomalies in a population (i.e., patients lying in sparse regions of the feature space). We identify these cases through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. Our algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neigbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach was able to successfully identify patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.
Keywords :
cryptography; medical computing; unsupervised learning; LSH-based algorithm; anomaly detection; concentric expansion; dynamic clinical dataset; k-nearest neighbor problem; locality sensitive hashing; near-optimal hashing; p-stable distribution; risk stratification algorithm; unsupervised machine learning; anomaly detection; locality sensitive hashing; massive datasets; nearest neighbors; risk stratification;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.88