Title :
State-dependent mixture tying with variable codebook size for accented speech recognition
Author :
Liu Yi ; Zheng Fang ; He lei ; Xia Yunqing
Author_Institution :
Tsinghua Nat. Lab. for Inf. Sci. & Technol., Beijing
Abstract :
In this paper, we propose a state-dependent tied mixture (SDTM) models with variable codebook size to improve the model robustness for accented phonetic variations while maintaining model discriminative ability. State tying and mixture tying are combined to generate SDTM models. Compared to a pure mixture tying system, the SDTM model uses state tying to reserve the state identity; compared to the sole state tying system, such model uses a small set of parameters to discard the overlapping mixture distributions for robust model estimation. The codebook size of SDTM model is varied according to the confusion probability of states. The more confusable a state is, the larger its codebook size gets for a higher degree of model resolution. The codebook size is governed by state level variation probability of accented phonetic confusions which can be automatically extracted by frame-to-state alignment based on the local model mismatch. The effectiveness of this approach is evaluated on Mandarin accented speech. Our method yields a significant 2.1%, 9.5% and 3.5% absolute word error rate reduction compared with state tying, mixture tying and state-based phonetic tied mixtures, respectively.
Keywords :
Gaussian distribution; estimation theory; hidden Markov models; natural language processing; speech coding; speech recognition; variable rate codes; Gaussian distribution; Mandarin accented speech; absolute word error rate reduction; accented phonetic confusion probability; accented phonetic variation; accented speech recognition; hidden Markov model; pure mixture tying system; robust model estimation; state-dependent mixture tying model; state-dependent tied mixture model; variable codebook size; Gaussian distribution; Helium; Hidden Markov models; Laboratories; Natural languages; Robustness; Speech recognition; State estimation; Technological innovation; Training data; State-dependent tied mixture models; variable codebook size;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430128