DocumentCode :
1354626
Title :
Time-frequency analysis and auditory modeling for automatic recognition of speech
Author :
Pitton, James W. ; Wang, Kuansan ; Juang, Biing-hwang
Author_Institution :
MathSoft, Seattle, WA, USA
Volume :
84
Issue :
9
fYear :
1996
fDate :
9/1/1996 12:00:00 AM
Firstpage :
1199
Lastpage :
1215
Abstract :
Modern speech processing research may be categorized into three broad areas: statistical, physiological, and perceptual. Statistical research investigates the nature of the variability of the speech waveform from a signal processing viewpoint. This approach relates to the processing of speech in order to obtain measurements of speech characteristics which demonstrate manageable variabilities across a wide range of the talker population, in the presence of noise or competing speakers as well as the interaction of speech with the channel through which it is transmitted, and under the inherent interaction of the information content of speech itself (i.e., the contextual factor). Physiological research aims at constructing accurate models of the articulatory and auditory process, helping to limit the signal space for speech processing. In the perceptual realm, work focuses on understanding the psychoacoustic and possibly the psycholinguistic aspects of the speech communication process that the human so conveniently conducts. By studying this working analysis/recognition system, insights may be garnered that will lead to improved methods of speech processing. Conversely by studying the limitations of this system, particularly how it reduces the information rate of the received signal through, for example, masking and adaptation improvements may be made in the efficiency of speech coding schemes without impacting the quality of the reconstructed speech. Thus comprehension of speech production and perception impacts methods of speech processing, and vice-versa. This paper enunciates such a position, focusing on how modern time-frequency signal analysis methods could help expedite needed advances in these areas
Keywords :
channel capacity; hearing; speech coding; speech intelligibility; speech processing; speech recognition; statistical analysis; time-frequency analysis; articulatory process; auditory modeling; auditory proces; automatic speech recognition; contextual factor; information rate reduction; measurements; noise; perceptual research; physiological research; psychoacoustic aspects; psycholinguistic aspects; signal processing; signal space; speech analysis; speech characteristics; speech communication; speech processing research; speech waveform; statistical research; talker population; time-frequency analysis; Content management; Context; Noise measurement; Psychoacoustic models; Psychology; Signal processing; Speech coding; Speech enhancement; Speech processing; Time frequency analysis;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/5.535241
Filename :
535241
Link To Document :
بازگشت