Title :
A Study on Invariance of
-Divergence and Its Application to Speech Recognition
Author :
Qiao, Yu ; Minematsu, Nobuaki
Author_Institution :
Shenzhen Inst. of Adv. Technol., Shenzhen, China
fDate :
7/1/2010 12:00:00 AM
Abstract :
Identifying features invariant to certain transformations is a fundamental problem in the fields of signal processing and pattern recognition. This correspondence explores a family of measures called f-divergences that are invariant to invertible transformations, and studies their application to speech recognition. We provide novel proofs for the sufficiency and necessity of the invariance of f-divergence. Several techniques to calculate or approximate f-divergences in general cases and for special distributions such as Gaussian and Gaussian mixture are reviewed. We show how to construct an invariant structural representation from sequence data through maximum likelihood decomposition, and prove the invariance of this decomposition. We demonstrate an application of this invariant representation to recognizing connected Japanese vowel utterances. In addition, we propose several techniques to improve the recognition performance. The experimental results show that the invariant structure achieves better performance than hidden Markov models, a widely used technique for acoustic modeling of speech sounds.
Keywords :
Gaussian processes; maximum likelihood estimation; speech recognition; Gaussian mixture; Japanese vowel utterances; f-divergence; invariant structural representation; invertible transformation; maximum likelihood decomposition; speech recognition; $f$-divergence; invariance to transformation; speech recognition; structural representation;
Journal_Title :
Signal Processing, IEEE Transactions on
DOI :
10.1109/TSP.2010.2047340