Title :
Iterative self-learning speaker and channel adaptation under various initial conditions
Author_Institution :
Dept. of Electr. & Comput. Eng., Illinois Univ., Urbana, IL, USA
Abstract :
A self-learning adaptation technique is presented which handles the speaker and channel induced spectral variations without enrolment speech. At the acoustic level, the distortion spectral bias is estimated in two steps using the unsupervised maximum likelihood estimation: in the first step, the probability distributions of the speech spectral features are assumed uniform for severely mismatched channels; in the second step, the spectral bias is reestimated assuming Gaussian distributions for the spectral features. At the phone unit level, unsupervised sequential adaptation is performed via Bayesian estimation from the online, bias-removed speech data, and iterative adaptation is further performed for dictation applications. Over four 198-sentence test sets, on a continuous speech recognition task with vocabulary size=853 and grammar perplexity=105, the largest increase of average word accuracy is 85.2% from the baseline accuracy of -0.3%, and the maximum average word accuracy is 89.4% from the baseline accuracy of 56.5%
Keywords :
Bayes methods; Gaussian distribution; acoustic signal processing; adaptive signal processing; dictation; iterative methods; maximum likelihood estimation; normal distribution; spectral analysis; speech processing; speech recognition; telecommunication channels; unsupervised learning; Bayesian estimation; Gaussian distributions; acoustic level; average word accuracy; continuous speech recognition; dictation applications; distortion spectral bias estimation; grammar perplexity; initial conditions; iterative self-learning channel adaptation; iterative self-learning speaker adaptation; mismatched channels; online bias-removed speech data; phone unit level; probability distributions; sentence test sets; spectral variations; speech spectral features; uniform distribution; unsupervised maximum likelihood estimation; unsupervised sequential adaptation; vocabulary size; Acoustic distortion; Bayesian methods; Delay estimation; Gaussian distribution; Hidden Markov models; Loudspeakers; Maximum likelihood decoding; Maximum likelihood estimation; Microphones; Probability distribution; Speech recognition; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479793