Title :
Automatic labeling system using speaker-dependent phonetic unit references
Author :
Makin, Shozo ; Wakita, Hisashi
Author_Institution :
Speech Technology Laboratory, Santa Barbara, CA, USA
Abstract :
This paper describes a new automatic labeling system using speaker-dependent reference patterns for 73 phonetic units in American English. The system segments arbitrary utterances into phonetic units and automatically adapts to a new speaker using a small set of training words. The labeling of the training words begins with the words which can be easily segmented into necessary phonetic units and then reference patterns for each unit are computed by use of vector quantization clustering. Using the training reference patterns together with vocalic-consonant information, the speech input is aligned with the transcription using dynamic programming with duration constraints for each phonetic unit. More accurate phonetic boundaries are obtained using new reference patterns derived from the input speech. The system was evaluated on 15 repetitions of 104 words uttered by two males and one female. Standard deviation of differences between manually labeled and automatically obtained boundaries ranged from 21 ms to 27 ms. Most of the discrepancies occurred at the boundaries between vowels, nasals and liquids.
Keywords :
Cepstral analysis; Data mining; Dynamic programming; Labeling; Laboratories; Liquids; Reproducibility of results; Speech recognition; Speech synthesis; Vector quantization;
Conference_Titel :
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.
DOI :
10.1109/ICASSP.1986.1168617