مرکز منطقه ای اطلاع رساني علوم و فناوري - An approach to segmenting speech into vowel-and nonvowel-like intervals

Abstract :

A speaker-independent algorithm is given for segmenting continuous speech in English into vowel-like (V) and nonvowel-like (NV) intervals. The algorithm has three stages: measurements (parameter extraction), phonetic feature detection, and V/NV decision. In measurements, the broad-band rms energy, the back-to-total cavity volume ratio (BTR), the signed front-to-back maximum area ratio (SFBR), and the normalized high-to low frequency energy ratio (HLR) are computed. The BTR and SFBR are new parameters derived from linear prediction area functions and are interpreted in terms of the speech spectrum. The BTR is useful for distinguishing nasal segments from V segments, while the SFBR is effective for detecting the bursts of voiced plosives. In phonetic feature detection, three independent types of intervals are detected on the basis of the parameters: silence, preliminary V/NV, and turbulence noise. The V/NV decision stage accomplishes the final V/NV interval decision. Interspeaker differences are handled by normalizing the frequency scale on the basis of an estimated average vocal-tract length. Ten sentences spoken by each of two males and two females resulted in 93.3 percent correct V/NV segment-detection decisions (92.9 percent for design speakers, and 93.7 percent for test speakers).