Title :
Searching Uncertain Data Represented by Non-axis Parallel Gaussian Mixture Models
Author :
Haegler, Katrin ; Fiedler, Frank ; Böhm, Christian
Author_Institution :
Inst. for Inf., Univ. of Munich, Munich, Germany
Abstract :
Efficient similarity search in uncertain data is a central problem in many modern applications such as biometric identification, stock market analysis, sensor networks, medical imaging, etc. In such applications, the feature vector of an object is not exactly known but is rather defined by a probability density function like a Gaussian Mixture Model (GMM). Previous work is limited to axis-parallel Gaussian distributions, hence, correlations between different features are not considered in the similarity search. In this paper, we propose a novel, efficient similarity search technique for general GMMs without independence assumption for the attributes, named SUDN, which approximates the actual components of a GMM in a conservative but tight way. A filter-refinement architecture guarantees no false dismissals, due to conservativity, as well as a good filter selectivity, due to the tightness of our approximations. An extensive experimental evaluation of SUDN demonstrates a considerable speed-up of similarity queries on general GMMs and an increase in accuracy compared to existing approaches.
Keywords :
Gaussian processes; data handling; query formulation; vectors; SUDN; feature vector; filter-refinement architecture; non-axis parallel Gaussian mixture models; probability density function; similarity search; uncertain data searching; Approximation methods; Covariance matrix; Databases; Gaussian distribution; Probability density function; Uncertainty; Vectors; MLIQ; gaussian mixture model; non-axis parallel GMM; similarity search; uncertain data;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.7