• DocumentCode
    2018470
  • Title

    Phonetic clustering based confidence measure for embedded speech recognition

  • Author

    Wang, Zhi-Guo ; Liu, Cong ; Wang, Hai-Kun ; Hu, Yu ; Dai, Li-Rong

  • Author_Institution
    iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    186
  • Lastpage
    189
  • Abstract
    Word posterior probability (WPP) based confidence measure (CM) has been applied successfully in LVCSR tasks. However, for embedded speech recognition in which system resource is limited, not only performance of CM but also efficiency of the algorithm need to be considered. One of the most important issue in calculating WPP is how to obtain reliable estimation of the normalization term. So in this paper we investigate several methods to estimate the normalization term and focus on methods using different phone-based grammar. Furthermore, to make good trade-off between performance and efficiency for embedded system, we present a general approach of estimating WPP based confidence score based on data-driven phonetic clustering, where Kullback-Leibler divergence (KLD) is employed for grouping all phones into different clusters. Corresponding acoustic and language models for calculating CM score can be re-trained based on the clustering phones. Experimental results on different Mandarin command word and digit recognition tasks show that the proposed method can significantly improve the efficiency with little degradation in CM performance, where more than 90% processing time of CM module is saved.
  • Keywords
    acoustic signal processing; embedded systems; probability; speech processing; speech recognition; text analysis; Kullback-Leibler divergence; Mandarin command word; acoustic models; confidence measure; digit recognition tasks; embedded speech recognition; embedded system; language models; normalization term; phone-based grammar; phonetic clustering; reliable estimation; system resource; word posterior probability; Acoustic measurements; Acoustics; Approximation methods; Computational modeling; Grammar; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684914
  • Filename
    5684914