Title :
An approach to segmenting speech into vowel-and nonvowel-like intervals
Author :
Kasuya, Hideki ; Wakita, Hisashi
Author_Institution :
Utsunomiya University, Utsunomiya, Japan
fDate :
8/1/1979 12:00:00 AM
Abstract :
A speaker-independent algorithm is given for segmenting continuous speech in English into vowel-like (V) and nonvowel-like (NV) intervals. The algorithm has three stages: measurements (parameter extraction), phonetic feature detection, and V/NV decision. In measurements, the broad-band rms energy, the back-to-total cavity volume ratio (BTR), the signed front-to-back maximum area ratio (SFBR), and the normalized high-to low frequency energy ratio (HLR) are computed. The BTR and SFBR are new parameters derived from linear prediction area functions and are interpreted in terms of the speech spectrum. The BTR is useful for distinguishing nasal segments from V segments, while the SFBR is effective for detecting the bursts of voiced plosives. In phonetic feature detection, three independent types of intervals are detected on the basis of the parameters: silence, preliminary V/NV, and turbulence noise. The V/NV decision stage accomplishes the final V/NV interval decision. Interspeaker differences are handled by normalizing the frequency scale on the basis of an estimated average vocal-tract length. Ten sentences spoken by each of two males and two females resulted in 93.3 percent correct V/NV segment-detection decisions (92.9 percent for design speakers, and 93.7 percent for test speakers).
Keywords :
Area measurement; Computer vision; Energy measurement; Feature extraction; Frequency estimation; Frequency measurement; Parameter extraction; Speech; Testing; Volume measurement;
Journal_Title :
Acoustics, Speech and Signal Processing, IEEE Transactions on
DOI :
10.1109/TASSP.1979.1163251