• DocumentCode
    846215
  • Title

    Approximately independent factors of speech using nonlinear symplectic transformation

  • Author

    Omar, Mohamed Kamal ; Hasegawa-Johnson, Mark

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Illinois, Urbana, IL, USA
  • Volume
    11
  • Issue
    6
  • fYear
    2003
  • Firstpage
    660
  • Lastpage
    671
  • Abstract
    This paper addresses the problem of representing the speech signal using a set of features that are approximately statistically independent. This statistical independence simplifies building probabilistic models based on these features that can be used in applications like speech recognition. Since there is no evidence that the speech signal is a linear combination of separate factors or sources, we use a more general nonlinear transformation of the speech signal to achieve our approximately statistically independent feature set. We choose the transformation to be symplectic to maximize the likelihood of the generated feature set. In this paper, we describe applying this nonlinear transformation to the speech time-domain data directly and to the Mel-frequency cepstrum coefficients (MFCC). We discuss also experiments in which the generated feature set is transformed into a more compact set using a maximum mutual information linear transformation. This linear transformation is used to generate the acoustic features that represent the distinctions among the phonemes. The features resulted from this transformation are used in phoneme recognition experiments. The best results achieved show about 2% improvement in recognition accuracy compared to results based on MFCC features.
  • Keywords
    cepstral analysis; feature extraction; independent component analysis; maximum likelihood estimation; speech recognition; transforms; Mel-frequency cepstrum coefficients; feature extraction; independent components analysis; maximum likelihood; maximum mutual information linear transformation; nonlinear transformation; phoneme recognition; speech recognition; speech signal; statistical independence; symplectic map; volume-preserving transform; Automatic speech recognition; Cepstrum; Character recognition; Independent component analysis; Information theory; Mel frequency cepstral coefficient; Mutual information; Speech analysis; Speech recognition; Time domain analysis;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2003.814457
  • Filename
    1255453