Abstract :
While many efforts have been put into the development of nonlinear approximation theory and its applications to signal and image compression, encoding and denoising, there seems to be very few theoretical developments of adaptive discriminant representations in the area of feature extraction, selection and signal classification. In this paper, we try to advocate the idea that such developments and efforts are worthwhile, based on the theoretical study of a data-driven discriminant analysis method on a simple-yet instructive-example. We consider the problem of classifying a signal drawn from a mixture of two classes, using its projections onto low-dimensional subspaces. Unlike the linear discriminant analysis (LDA) strategy, which selects subspaces that do not depend on the observed signal, we consider an adaptive sequential selection of projections, in the spirit of nonlinear approximation and classification and regression trees (CART): at each step, the subspace is enlarged in a direction that maximizes the mutual information with the unknown class. We derive explicit characterizations of this adaptive discriminant analysis (ADA) strategy in two situations. When the two classes are Gaussian with the same covariance matrix but different means, the adaptive subspaces are actually nonadaptive and can be computed with an algorithm similar to orthonormal matching pursuit. When the classes are centered Gaussians with different covariances, the adaptive subspaces are spanned by eigen-vectors of an operator given by the covariance matrices (just as could be predicted by regular LDA), however we prove that the order of observation of the components along these eigen-vectors actually depends on the observed signal. Numerical experiments on synthetic data illustrate how data-dependent features can be used to outperform LDA on a classification task, and we discuss how our results could be applied in practice.
Keywords :
Gaussian processes; approximation theory; covariance matrices; feature extraction; pattern classification; regression analysis; signal classification; trees (mathematics); CART; adaptive discriminant analysis; adaptive sequential projection selection; classification trees; covariance matrix; data driven discriminant analysis; nonlinear approximation theory; projection pursuit; regression trees; signal classification; Approximation methods; Classification tree analysis; Covariance matrix; Data analysis; Feature extraction; Image coding; Linear discriminant analysis; Matching pursuit algorithms; Noise reduction; Pattern classification; Classification and regression trees (CART); classification tree; discriminant analysis; mutual information; nonlinear approximation; projection pursuit; sequential testing; Algorithms; Computer Simulation; Discriminant Analysis; Neural Networks (Computer); Nonlinear Dynamics; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; Stochastic Processes;