• DocumentCode
    2791513
  • Title

    Analysis of phone posterior feature space exploiting class-specific sparsity and MLP-based similarity measure

  • Author

    Asaei, Afsaneh ; Picart, Benjamin ; Bourlard, Hervé

  • Author_Institution
    IDIAP Res. Inst., Martigny, Switzerland
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4886
  • Lastpage
    4889
  • Abstract
    Class posterior distributions have recently been used quite successfully in Automatic Speech Recognition (ASR), either for frame or phone level classification or as acoustic features, which can be further exploited (usually after some “ad hoc” transformations) in different classifiers (e.g., in Gaussian Mixture based HMMs). In the present paper, we show preliminary results showing that it may be possible to perform speech recognition without explicit subword unit (phone) classification or likelihood estimation, simply answering the question whether two acoustic (posterior) vectors belong to the same subword unit class or not. In this paper, we first exhibit specific properties of the posterior acoustic space before showing how those properties can be exploited to reach very high performance in deciding (based on an appropriate, trained, distance metric, and hypothesis testing approaches) whether two posterior vectors belong to the same class or not. Performance as high as 90% correct decision rates are reported on the TIMIT database, before reporting kNN phone classification rates.
  • Keywords
    multilayer perceptrons; pattern classification; speech recognition; MLP-based similarity measure; TIMIT database; acoustic vectors; automatic speech recognition; class posterior distributions; class-specific sparsity; kNN phone classification rates; phone posterior feature space; Acoustic testing; Automatic speech recognition; Extraterrestrial measurements; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Multilayer perceptrons; Spatial databases; Speech analysis; Speech recognition; Posterior feature space; kNN classifier; posterior space properties; posterior-based metrics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495121
  • Filename
    5495121