Robust speech recognition with on-line unsupervised acoustic feature compensation

Author

Buera, Luis ; Miguel, Antonio ; Lleida, Eduardo ; Saz, Óscar ; Ortega, Alfonso

Author_Institution

Zaragoza Univ., Zaragoza

fYear

2007

fDate

9-13 Dec. 2007

Firstpage

105

Lastpage

110

Abstract

An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.

Keywords

Gaussian processes; audio acoustics; compensation; decoding; estimation theory; feature extraction; matrix algebra; probability; regression analysis; speech coding; speech recognition; unsupervised learning; vectors; GMM; MEMLIN CPM-normalized training data; Viterbi algorithm; acoustic model adaptation method; cross-probability model; feature vector normalization; linear regression; multienvironment model linear normalization; normalized frame decoding; online unsupervised acoustic feature compensation; online unsupervised hybrid compensation technique; rotation matrix estimation process; rotation transformations; speech recognition; testing conditions; training conditions; Acoustic testing; Adaptation model; Databases; Decoding; Linear regression; Robustness; Speech recognition; Telecommunication standards; Training data; Viterbi algorithm; acoustic model adaptation; feature vector normalization; robust speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on

Conference_Location

Kyoto

Print_ISBN

978-1-4244-1746-9

Electronic_ISBN

978-1-4244-1746-9

Type

conf

DOI

10.1109/ASRU.2007.4430092

Filename

4430092