Title :
On the perceptual distance between speech segments
Author :
Ghitza, Oded ; Sondhi, M. Mohan
Author_Institution :
Dept. of Acoust. Res., AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (By speech segments the authors mean pieces of a speech signal of duration 50-150 milliseconds. For concreteness they consider a segment to mean a diphone.) Such a distance metric would be useful for speech coding at low bit rates. Saving bits in those systems relies on a perceptual tolerance to acoustic deviations from the original speech, deviations that typically last for several tens of milliseconds. Such a distance metric would also be useful for automatic speech recognition on the assumption that perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) and to phonemic variability (due to non-uniqueness of articulatory gestures) may provide a basis for robust performance. Here the authors describe their attempts at defining such a metric
Keywords :
hearing; psychology; speech processing; 50 to 150 ms; acoustic deviations tolerance; adverse signal conditions; articulatory gestures nonuniqueness; auditory perception; channel distortions; diphone; microphone distortions; noise distortions; original speech; perceptual distance between speech segments; perceptual invariance; phonemic variability; room reverberations; speech coding; speech signal; Acoustic distortion; Acoustic noise; Automatic speech recognition; Bit rate; Microphones; Noise robustness; Reverberation; Signal processing; Speech coding; Speech processing;
Conference_Titel :
Bioengineering Conference, 1996., Proceedings of the 1996 IEEE Twenty-Second Annual Northeast
Conference_Location :
New Brunswick, NJ
Print_ISBN :
0-7803-3204-0
DOI :
10.1109/NEBC.1996.503206