Abstract :
Bayesian classifiers rely on models of the a priori and
class-conditional feature distributions; the classifier is trained by
optimizing these models to best represent features observed in a
training corpus according to certain criterion. In many problems
of interest, the true class-conditional feature probability density
function (PDF) is not a member of the set of PDFs the classifier
can represent. Previous research has shown that the effect of
this problem may be reduced either by improving the models or
by transforming the features used in the classifier. This paper
addresses this model mismatch problem in statistical identification,
classification, and recognition systems. We formulate the
problem as the problem of minimizing the relative entropy, which
is also known as the Kullback–Leibler distance, between the true
conditional PDF and the hypothesized probabilistic model. Based
on this formulation, we provide a computationally efficient solution
to the problem based on volume-preserving maps; existing
linear transform designs are shown to be special cases of the
proposed solution. Using this result, we propose the symplectic
maximum likelihood transform (SMLT), which is a nonlinear
volume-preserving extension of the maximum likelihood linear
transform (MLLT). This approach has many applications in
statistical modeling, classification, and recognition. We apply it
to the maximum likelihood estimation (MLE) of the joint PDF of
order statistics and show a significant increase in the likelihood
for the same number of parameters. We provide also phoneme
recognition experiments that show recognition accuracy improvement
compared with using the baseline Mel-Frequency Cepstrum
Coefficient (MFCC) features or using MLLT.We present an iterative
algorithm to jointly estimate the parameters of the symplectic
map and the probabilistic model for both applications.