• DocumentCode
    1133453
  • Title

    A multistage algorithm for spotting new words in speech

  • Author

    Dharanipragada, Satya ; Roukos, Salim

  • Author_Institution
    T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
  • Volume
    10
  • Issue
    8
  • fYear
    2002
  • fDate
    11/1/2002 12:00:00 AM
  • Firstpage
    542
  • Lastpage
    550
  • Abstract
    In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.
  • Keywords
    speech recognition; acoustic match; audio-indexing; coarse search; coarse-match step; coarse-to-detailed search stage; combined algorithm distributed speech recognition applications; fast vocabulary independent algorithm; indexing stage; lattice-based phoneme-matching algorithms; multistage algorithm; new words; pervasive speech recognition applications; phone-ngram index; phone-ngram matching; phone-ngram representation; phoneme-level representation; speech; spoken message retrieval; video-browsing; vocabulary prefix tree; word hits; word/phone sequence; Indexing; Information retrieval; Object detection; Runtime; Speech analysis; Speech recognition; Text recognition; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2002.804543
  • Filename
    1175526