DocumentCode :
3109501
Title :
HMM-supervised classification of the NMF components for robust speech recognition
Author :
Nengheng Zheng ; Xia Li ; Yi Cai
Author_Institution :
Shenzhen Key Lab. of Telecommun. & Inf. Process., Shenzhen Univ., Shenzhen, China
fYear :
2012
fDate :
9-12 Dec. 2012
Firstpage :
83
Lastpage :
87
Abstract :
This paper presents a nonnegative matrix factorization (NMF)-based source separation algorithm for robust speech recognition with music interference. NMF is applied to decompose the mixture signal into a set of basis vectors and corresponding gain vectors, each belonging to either speech or music. Source separation is achieved via classifying the NMF components, i.e. the basis and the corresponding gain vectors into their respective classes. HMM models are incorporated to supervise the classification. More specifically, the likelihood score output from the Viterbi search, i.e. the probability of the input speech given the recognized word models, is adopted as the classification criterion. Such that the separated speech consists of those NMF components having positive contributions to the Viterbi search score. As a result, the recognition output after the separation processing is mostly confident. Automatic speech recognition experiments demonstrate that the proposed source separation algorithm significantly improve the robustness of the recognition system under music interference.
Keywords :
hidden Markov models; matrix decomposition; maximum likelihood estimation; signal classification; source separation; speech recognition; vectors; HMM-supervised classification; NMF component; NMF-based source separation algorithm; Viterbi search; basis vector; classification criterion; gain vector; hidden Markov model; input speech probability; likelihood score output; music interference; nonnegative matrix factorization; robust speech recognition; separation processing; Acoustics; Hidden Markov models; Source separation; Speech; Speech recognition; Support vector machine classification; Viterbi algorithm; Speech separation; nonnegative matrix factorization; speech and music; speech recognition; supervised classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-2811-1
Electronic_ISBN :
978-1-4673-2812-8
Type :
conf
DOI :
10.1109/ICSDA.2012.6422467
Filename :
6422467
Link To Document :
بازگشت