Relevance of auditory cortical representations to speech processing and recognition

Author

Shamma, Shihab

Author_Institution

Maryland Univ., College Park, MD

fYear

2005

fDate

27-27 Nov. 2005

Firstpage

5

Lastpage

5

Abstract

Summary form only given. Humans are readily capable of understanding speech despite substantial distortions, high levels of ambient noise, or interference from other speakers. Several factors are responsible for this robust performance ranging all the way from stable early auditory representations to sophisticated linguistic knowledge. In this talk, I shall describe processes that occur at intermediate levels of the central auditory pathway, specifically the midbrain and primary auditory cortex. At these levels, the relatively simple short-term acoustic spectra extracted early at the cochlea are elaborated into multi-dimensional representations that integrate spectral and temporal information over many scales. This transformation is accomplished by cortical cells that are not simply selective to the spectral energy of the acoustic signal, but rather to the complex combinations of its spectral and temporal modulations that are the true carriers of intelligibility in speech, and more generally of timbre in sound. For instance, some cells may encode selectively rapidly changing broadband spectra, while others are sensitive to slowly varying narrowband energy. This decomposition of the spectrogram affords the brain both a rich and a versatile representation that can be employed as a "metric" to assess sound quality or speech intelligibility, as well as to manipulate its characteristics in a variety of auditory tasks. I will explain in this talk the physiological and psychoacoustical data relevant to these representations, the mathematical formulation of the cortical model, and how it can be adapted to applications in ASR, assessment of speech intelligibility, speech enhancement, and signal conditioning for hearing aids. I shall also highlight recent approaches in ASR that incorporate many of the features that make these representations powerful, specifically the integration of spectral information over relatively long time scales (100\´s ms) and over broad spectral band- - widths (> 1 octave). Finally, I shall discuss the relevance of new discoveries in rapid cortical plasticity to the design of adaptive speech processing strategies and algorithms for separating speech streams on monaural channels

Keywords

acoustic signal processing; hearing aids; speech processing; speech recognition; acoustical signal; adaptive speech processing; auditory cortical representations; cortical cells; hearing aids; rapid cortical plasticity; sophisticated linguistic knowledge; speech recognition; Acoustic distortion; Acoustic noise; Automatic speech recognition; Humans; Interference; Noise level; Noise robustness; Speech enhancement; Speech processing; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on

Conference_Location

San Juan

Print_ISBN

0-7803-9478-X

Electronic_ISBN

0-7803-9479-8

Type

conf

DOI

10.1109/ASRU.2005.1566462

Filename

1566462