Title :
Kernel metric learning for phonetic classification
Author :
Huang, Jui-Ting ; Zhou, Xi ; Hasegawa-Johnson, Mark ; Huang, Thomas
Author_Institution :
Beckman Inst., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fDate :
Nov. 13 2009-Dec. 17 2009
Abstract :
While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.
Keywords :
speech recognition; statistical analysis; kernel metric learning; phonetic classification; speech frames emphasis; speech recognition framework; statistical models; Acoustical engineering; Computational efficiency; Hidden Markov models; Humans; Kernel; Machine learning; Phase estimation; Proportional control; Spatial databases; Speech recognition;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
DOI :
10.1109/ASRU.2009.5373389