Title :
Learning Minimum Volume Sets with Support Vector Machines
Author :
Davenport, Mark A. ; Baraniuk, Richard G. ; Scott, Clayton D.
Author_Institution :
Dept. of Electr. & Comput. Eng., Rice Univ., Houston, TX
Abstract :
Given a probability law P on d-dimensional Euclidean space, the minimum volume set (MV-set) with mass beta, 0 < beta < 1, is the set with smallest volume enclosing a probability mass of at least beta. We examine the use of support vector machines (SVMs) for estimating an MV-set from a collection of data points drawn from P, a problem with applications in clustering and anomaly detection. We investigate both one-class and two-class methods. The two-class approach reduces the problem to Neyman-Pearson (NP) classification, where we artificially generate a second class of data points according to a uniform distribution. The simple approach to generating the uniform data suffers from the curse of dimensionality. In this paper we (1) describe the reduction of MV-set estimation to NP classification, (2) devise improved methods for generating artificial uniform data for the two-class approach, (3) advocate a new performance measure for systematic comparison of MV-set algorithms, and (4) establish a set of benchmark experiments to serve as a point of reference for future MV-set algorithms. We find that, in general, the two-class method performs more reliably.
Keywords :
pattern classification; probability; set theory; support vector machines; Euclidean space; Neyman-Pearson classification; artificial uniform data; minimum volume set; probability law; support vector machine; uniform distribution; Ellipsoids; Fault detection; Gaussian distribution; Level set; Machine learning; Probability; Statistics; Support vector machine classification; Support vector machines; Volume measurement;
Conference_Titel :
Machine Learning for Signal Processing, 2006. Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on
Conference_Location :
Arlington, VA
Print_ISBN :
1-4244-0656-0
Electronic_ISBN :
1551-2541
DOI :
10.1109/MLSP.2006.275565