DocumentCode
1092489
Title
An approach to segmenting speech into vowel-and nonvowel-like intervals
Author
Kasuya, Hideki ; Wakita, Hisashi
Author_Institution
Utsunomiya University, Utsunomiya, Japan
Volume
27
Issue
4
fYear
1979
fDate
8/1/1979 12:00:00 AM
Firstpage
319
Lastpage
327
Abstract
A speaker-independent algorithm is given for segmenting continuous speech in English into vowel-like (V) and nonvowel-like (NV) intervals. The algorithm has three stages: measurements (parameter extraction), phonetic feature detection, and V/NV decision. In measurements, the broad-band rms energy, the back-to-total cavity volume ratio (BTR), the signed front-to-back maximum area ratio (SFBR), and the normalized high-to low frequency energy ratio (HLR) are computed. The BTR and SFBR are new parameters derived from linear prediction area functions and are interpreted in terms of the speech spectrum. The BTR is useful for distinguishing nasal segments from V segments, while the SFBR is effective for detecting the bursts of voiced plosives. In phonetic feature detection, three independent types of intervals are detected on the basis of the parameters: silence, preliminary V/NV, and turbulence noise. The V/NV decision stage accomplishes the final V/NV interval decision. Interspeaker differences are handled by normalizing the frequency scale on the basis of an estimated average vocal-tract length. Ten sentences spoken by each of two males and two females resulted in 93.3 percent correct V/NV segment-detection decisions (92.9 percent for design speakers, and 93.7 percent for test speakers).
Keywords
Area measurement; Computer vision; Energy measurement; Feature extraction; Frequency estimation; Frequency measurement; Parameter extraction; Speech; Testing; Volume measurement;
fLanguage
English
Journal_Title
Acoustics, Speech and Signal Processing, IEEE Transactions on
Publisher
ieee
ISSN
0096-3518
Type
jour
DOI
10.1109/TASSP.1979.1163251
Filename
1163251
Link To Document