DocumentCode
2018470
Title
Phonetic clustering based confidence measure for embedded speech recognition
Author
Wang, Zhi-Guo ; Liu, Cong ; Wang, Hai-Kun ; Hu, Yu ; Dai, Li-Rong
Author_Institution
iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2010
fDate
Nov. 29 2010-Dec. 3 2010
Firstpage
186
Lastpage
189
Abstract
Word posterior probability (WPP) based confidence measure (CM) has been applied successfully in LVCSR tasks. However, for embedded speech recognition in which system resource is limited, not only performance of CM but also efficiency of the algorithm need to be considered. One of the most important issue in calculating WPP is how to obtain reliable estimation of the normalization term. So in this paper we investigate several methods to estimate the normalization term and focus on methods using different phone-based grammar. Furthermore, to make good trade-off between performance and efficiency for embedded system, we present a general approach of estimating WPP based confidence score based on data-driven phonetic clustering, where Kullback-Leibler divergence (KLD) is employed for grouping all phones into different clusters. Corresponding acoustic and language models for calculating CM score can be re-trained based on the clustering phones. Experimental results on different Mandarin command word and digit recognition tasks show that the proposed method can significantly improve the efficiency with little degradation in CM performance, where more than 90% processing time of CM module is saved.
Keywords
acoustic signal processing; embedded systems; probability; speech processing; speech recognition; text analysis; Kullback-Leibler divergence; Mandarin command word; acoustic models; confidence measure; digit recognition tasks; embedded speech recognition; embedded system; language models; normalization term; phone-based grammar; phonetic clustering; reliable estimation; system resource; word posterior probability; Acoustic measurements; Acoustics; Approximation methods; Computational modeling; Grammar; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location
Tainan
Print_ISBN
978-1-4244-6244-5
Type
conf
DOI
10.1109/ISCSLP.2010.5684914
Filename
5684914
Link To Document