• DocumentCode
    2800840
  • Title

    Learning-based auditory encoding for robust speech recognition

  • Author

    Chiu, Yu-Hsiang Bosco ; Raj, Bhiksha ; Stern, Richard M.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4278
  • Lastpage
    4281
  • Abstract
    This paper describes ways of speeding up the optimization process for learning physiologically-motivated components of a feature computation module directly from data. During training, word lattices generated by the speech decoder and conjugate gradient descent were included to train the parameters of logistic functions in a fashion that maximizes the a posteriori probability of the correct class in the training data. These functions represent the rate-level nonlinearities found in most mammalian auditory systems. Experiments conducted using the CMU SPHINX-III system on the DARPA Resource Management and Wall Street Journal tasks show that the use of discriminative training to estimate the shape of the rate-level nonlinearity provides better recognition accuracy in the presence of background noise than traditional procedures which do not employ learning. More importantly, the inclusion of conjugate gradient descent optimization and a word lattice to reduce the number of hypotheses considered greatly increases the training speed, which makes training with much more complicated models possible.
  • Keywords
    acoustic noise; acoustic signal processing; conjugate gradient methods; hearing; speech coding; speech recognition; background noise; conjugate gradient; conjugate gradient descent optimization; discriminative training; feature computation module; learning-based auditory encoding; logistic functions; mammalian auditory systems; physiologically-motivated components; posteriori probability; rate-level nonlinearities; robust speech recognition; speech decoder; word lattices; Auditory system; Decoding; Encoding; Lattices; Logistics; Management training; Resource management; Robustness; Speech recognition; Training data; auditory models; automatic speech recognition; data analysis; discriminative training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495666
  • Filename
    5495666