Abstract :
This paper introduces a unified framework for online adaptation of hidden Markov models (HMM) parameters to real-life conditions. Hence, it aims at improving the robustness of speech recognition systems. In addition, it describes some techniques developed to control the convergence of adaptation in unsupervised modes. Classically, two approaches have been used to adapt HMM parameters to new conditions, that is, Bayesian adaptation and spectral transformation-generally using linear regression. This paper lays out a unifying framework where both Bayesian adaptation and spectral transformation adaptation are seen as particular cases. In this sense, the framework attributes one transformation to each Gaussian distribution and partitions the latter automatically with respect to the adaptation data. Thus, the transformations of each class would share the same parameter vector. Consequently, the global transformation gets a data-driven freedom degree. The parameters of the global transformation are determined according to the maximum a posteriori (MAP) criterion using the original HMM a priori distributions. The general adaptation algorithm has been implemented within the CNET speech recognition system and the whole system evaluated on several field-telephone databases. The new adaptation method provides us with a systematic convergence in an online unsupervised mode of the speech recognition system toward a system enrolled with field data in a supervised mode
Keywords :
Bayes methods; Gaussian distribution; adaptive systems; convergence of numerical methods; hidden Markov models; online operation; spectral analysis; speech recognition; telephony; unsupervised learning; Bayesian adaptation; CNET; Gaussian distribution; HMM a priori distributions; HMM parameters; MAP criterion; adaptation algorithm; adaptation convergence; adaptation data; data-driven freedom degree; field-telephone databases; global transformation; hidden Markov models; linear regression; maximum a posteriori criterion; online adaptation; online unsupervised mode; parameter vector; spectral transformation; speech recognition systems; supervised mode; Automatic speech recognition; Bayesian methods; Convergence; Databases; Gaussian distribution; Hidden Markov models; Linear regression; Maximum likelihood estimation; Robustness; Speech recognition;