• DocumentCode
    2874818
  • Title

    Relevance of auditory cortical representations to speech processing and recognition

  • Author

    Shamma, Shihab

  • Author_Institution
    Maryland Univ., College Park, MD
  • fYear
    2005
  • fDate
    27-27 Nov. 2005
  • Firstpage
    5
  • Lastpage
    5
  • Abstract
    Summary form only given. Humans are readily capable of understanding speech despite substantial distortions, high levels of ambient noise, or interference from other speakers. Several factors are responsible for this robust performance ranging all the way from stable early auditory representations to sophisticated linguistic knowledge. In this talk, I shall describe processes that occur at intermediate levels of the central auditory pathway, specifically the midbrain and primary auditory cortex. At these levels, the relatively simple short-term acoustic spectra extracted early at the cochlea are elaborated into multi-dimensional representations that integrate spectral and temporal information over many scales. This transformation is accomplished by cortical cells that are not simply selective to the spectral energy of the acoustic signal, but rather to the complex combinations of its spectral and temporal modulations that are the true carriers of intelligibility in speech, and more generally of timbre in sound. For instance, some cells may encode selectively rapidly changing broadband spectra, while others are sensitive to slowly varying narrowband energy. This decomposition of the spectrogram affords the brain both a rich and a versatile representation that can be employed as a "metric" to assess sound quality or speech intelligibility, as well as to manipulate its characteristics in a variety of auditory tasks. I will explain in this talk the physiological and psychoacoustical data relevant to these representations, the mathematical formulation of the cortical model, and how it can be adapted to applications in ASR, assessment of speech intelligibility, speech enhancement, and signal conditioning for hearing aids. I shall also highlight recent approaches in ASR that incorporate many of the features that make these representations powerful, specifically the integration of spectral information over relatively long time scales (100\´s ms) and over broad spectral band- - widths (> 1 octave). Finally, I shall discuss the relevance of new discoveries in rapid cortical plasticity to the design of adaptive speech processing strategies and algorithms for separating speech streams on monaural channels
  • Keywords
    acoustic signal processing; hearing aids; speech processing; speech recognition; acoustical signal; adaptive speech processing; auditory cortical representations; cortical cells; hearing aids; rapid cortical plasticity; sophisticated linguistic knowledge; speech recognition; Acoustic distortion; Acoustic noise; Automatic speech recognition; Humans; Interference; Noise level; Noise robustness; Speech enhancement; Speech processing; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
  • Conference_Location
    San Juan
  • Print_ISBN
    0-7803-9478-X
  • Electronic_ISBN
    0-7803-9479-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2005.1566462
  • Filename
    1566462