Automatic labeling system using speaker-dependent phonetic unit references

Author

Makin, Shozo ; Wakita, Hisashi

Author_Institution

Speech Technology Laboratory, Santa Barbara, CA, USA

Volume

11

fYear

1986

fDate

31503

Firstpage

2783

Lastpage

2786

Abstract

This paper describes a new automatic labeling system using speaker-dependent reference patterns for 73 phonetic units in American English. The system segments arbitrary utterances into phonetic units and automatically adapts to a new speaker using a small set of training words. The labeling of the training words begins with the words which can be easily segmented into necessary phonetic units and then reference patterns for each unit are computed by use of vector quantization clustering. Using the training reference patterns together with vocalic-consonant information, the speech input is aligned with the transcription using dynamic programming with duration constraints for each phonetic unit. More accurate phonetic boundaries are obtained using new reference patterns derived from the input speech. The system was evaluated on 15 repetitions of 104 words uttered by two males and one female. Standard deviation of differences between manually labeled and automatically obtained boundaries ranged from 21 ms to 27 ms. Most of the discrepancies occurred at the boundaries between vowels, nasals and liquids.

Keywords

Cepstral analysis; Data mining; Dynamic programming; Labeling; Laboratories; Liquids; Reproducibility of results; Speech recognition; Speech synthesis; Vector quantization;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.

Type

conf

DOI

10.1109/ICASSP.1986.1168617

Filename

1168617