Author :
O´Shaughnessy, Douglas ; Kolumban, Geza ; Lecomte, Roger
Abstract :
Speech coding has found great success in today´s widespread usage of cellphones. In addition, people are increasingly accustomed to hearing and accepting synthetic voices when they access information by phone. A third major system used for speech, automatic speech recognition (ASR), is also seeing significant usage, but still has major limitations, falling far short of what human listeners can do. This talk will examine the modern techniques applied for recognition of the information present in speech: its textual content, the identity of the speaker, emotional state, and the language used. We will examine ways to extract relevant parameters, while gnoring channel distortions and extraneous sounds that may also be present in the signal. A brief history of ASR development will show the evolution of usage of Fourier analysis, linear prediction, cepstrum, and neural networks. The strengths and weaknesses of the modern approach to ASR that uses mel-frequency cepstral coefficients (MFCC), hidden Markov Models (HMM), and language models will be discussed.