DocumentCode
417160
Title
Performance comparisons of all-pass transform adaptation with maximum likelihood linear regression
Author
McDonough, John ; Waibel, Alex
Author_Institution
Inst. fur Logik, Komplexitat, und Deduktionssysteme, Karlsruhe Univ., Germany
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
All-pass transform (APT) adaptation transforms the cepstral means of a hidden Markov model so as to mimic the effect of warping the short-time frequency axis of a segment of speech, much like vocal tract length normalization (VTLN). However, APT adaptation can be implemented as a linear transformation in the cepstral domain, much like the better known maximum likelihood linear regression (MLLR). Recent work demonstrated the superior performance of APT adaptation to MLLR for a large vocabulary conversational speech recognition task. This work presents similar comparisons on the switchboard corpus. We found that without VTLN, the best MLLR and APT systems achieved word error rates (WERs) of 43.0% and 40.2% respectively. Similarly, with VTLN the respective error rates were 40.3%, and 39.2%, so that APT adaptation is significantly better in both cases. We also undertook a set of experiments to determine whether APT adaptation can be combined with a linear semi-tied covariance (STC) transform. With a single APT per speaker, the application of STC reduced the WER from 42.9% to 39.4%.
Keywords
cepstral analysis; error statistics; hidden Markov models; natural languages; speech recognition; MLLR; WER; all-pass transform adaptation; cepstral means; conversational speech recognition; hidden Markov model; maximum likelihood linear regression; semi-tied covariance transform; short-time frequency axis warping; switchboard corpus; vocal tract length normalization; word error rates; Cepstral analysis; Error analysis; Frequency; Hidden Markov models; Interactive systems; Laboratories; Maximum likelihood estimation; Maximum likelihood linear regression; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1325985
Filename
1325985
Link To Document