DocumentCode :
839128
Title :
Speaker adaptive modeling by vocal tract normalization
Author :
Welling, Lutz ; Ney, Hermann ; Kanthak, Stephan
Author_Institution :
Comput. Sci. Dept., Rheinisch-Westfalische Tech. Hochschule, Aachen, Germany
Volume :
10
Issue :
6
fYear :
2002
fDate :
9/1/2002 12:00:00 AM
Firstpage :
415
Lastpage :
426
Abstract :
This paper presents methods for speaker adaptive modeling using vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new training method for VTN: By using single-density acoustic models per HMM state for selecting the scale factor of the frequency axis, we avoid the problem that a mixture-density tends to learn the scale factors of the training speakers and thus cannot be used for selecting the scale factor. We show that using single Gaussian densities for selecting the scale factor in training results in lower error rates than using mixture densities. For the recognition phase, we propose an improvement of the well-known two-pass strategy: by using a non-normalized acoustic model for the first recognition pass instead of a normalized model, lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The two-pass strategy is an efficient method, but it is suboptimal because the scale factor and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. In summary, on the German spontaneous speech task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill, the proposed methods for VTN reduce the error rates significantly.
Keywords :
Gaussian processes; acoustic signal processing; adaptive signal processing; hidden Markov models; speech recognition; German spontaneous speech task; German telephone digit string corpus; HMM state; SieTill; Verlimobil; WSJ task; databases; error rate reduction; frequency scale factor; nonnormalized acoustic model; single Gaussian densities; single-density acoustic models; speaker adaptive modeling; telephone digit string recognition; training method; training results; training speakers; two-pass strategy; vocal tract normalization; word sequence; Acoustic emission; Acoustic testing; Databases; Error analysis; Frequency; Hidden Markov models; Loudspeakers; Performance gain; Speech recognition; Telephony;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/TSA.2002.803435
Filename :
1040265
Link To Document :
بازگشت