Title :
Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching
Author :
Ni, Jinfu ; Kawai, Hisashi
Author_Institution :
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
Abstract :
A functional fundamental frequency (F0) model is applied to extract tone peak and gliding features from Mandarin F0 contours aiming at automatic prosodic labeling of a large scale speech corpus. Modeling four lexical tones and representing them in a parametric form based on the F0 model, we first cluster baseline tone patterns using the LBG (Linde-Buzo-Gray) algorithm, then perform analysis-by-synthesis-based pattern matching to estimate underlying tone peaks and tone pattern types from observed F0 contours and phonetic labels with lexical tones. Tone gliding features are re-estimated after the determination of tone peaks. 94% of the automatically estimated labels were consistent with the manual labels in an open test of 968 utterances from eight native speakers. Also, experimental results indicate that the proposed method is applicable for F0 contour smoothing and tone verification.
Keywords :
feature extraction; natural languages; parameter estimation; pattern matching; speech processing; speech recognition; LBG algorithm; Linde-Buzo-Gray algorithm; Mandarin; analysis-by-synthesis; automatic prosodic labeling; contour smoothing; functional model; fundamental frequency; lexical tones; parametric modeling; pattern matching; speech corpus; tone feature extraction; tone gliding features; tone peak features; tone verification; Clustering algorithms; Feature extraction; Frequency; Labeling; Large-scale systems; Parametric statistics; Pattern analysis; Pattern matching; Performance analysis; Speech;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198719