• DocumentCode
    337464
  • Title

    Utterance verification using prosodic information for Mandarin telephone speech keyword spotting

  • Author

    Chen, Yeou-Jiunn ; Wu, Chung-Hsien ; Yan, Gwo-Lang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    2
  • fYear
    1999
  • fDate
    15-19 Mar 1999
  • Firstpage
    697
  • Abstract
    In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIALs and 37 FINALs in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMMs, 175 context-dependent prosodic HMMs, and five anti-prosodic HMMs, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance
  • Keywords
    feature extraction; hidden Markov models; speech recognition; Mandarin speech; anti-prosodic HMM; anti-subsyllable HMM; background/silence model; context-dependent prosodic HMM; context-independent subsyllables; false alarm rate; false rejection; keyword recognition; keyword spotting; keyword verification function; phonetic-phase verification; prosodic information; prosodic-phase verification; speech recognition; telephone speech; two-stage strategy; utterance verification; Computer science; Context modeling; Data mining; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech recognition; Statistics; Telephony; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-5041-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1999.759762
  • Filename
    759762