• DocumentCode
    2330001
  • Title

    An efficient approach for two-stage open vocabulary spoken term detection

  • Author

    Norouzian, Atta ; Rose, Richard

  • Author_Institution
    Dept. of ECE, McGill Univ., Montreal, QC, Canada
  • fYear
    2010
  • fDate
    12-15 Dec. 2010
  • Firstpage
    194
  • Lastpage
    199
  • Abstract
    This paper investigates indexing strategies for open vocabulary spoken term detection (STD) in a lecture speech domain. STD is performed from word lattices generated offline using an automatic speech recognition (ASR) system configured from a meetings task domain. Indexing of lattice paths is performed to avoid exhaustive search of audio segments which can be impractical for extremely large media repositories. The method is based on constructing a word-based index from these lattices and using an approximate subword-based algorithm for accessing index entries from subword expansions of query terms. Results are presented for an experimental study demonstrating both STD performance and the potential for scaling the indexing strategy to very large collections of audio segments.
  • Keywords
    indexing; speech processing; speech recognition; word processing; approximate subword based algorithm; audio segment; automatic speech recognition system; lattice path indexing; lecture speech domain; open vocabulary spoken term detection; two stage open vocabulary spoken term detection; word based index; Speech recognition; spoken term detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2010 IEEE
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-7904-7
  • Electronic_ISBN
    978-1-4244-7902-3
  • Type

    conf

  • DOI
    10.1109/SLT.2010.5700850
  • Filename
    5700850