Title :
An efficient approach for two-stage open vocabulary spoken term detection
Author :
Norouzian, Atta ; Rose, Richard
Author_Institution :
Dept. of ECE, McGill Univ., Montreal, QC, Canada
Abstract :
This paper investigates indexing strategies for open vocabulary spoken term detection (STD) in a lecture speech domain. STD is performed from word lattices generated offline using an automatic speech recognition (ASR) system configured from a meetings task domain. Indexing of lattice paths is performed to avoid exhaustive search of audio segments which can be impractical for extremely large media repositories. The method is based on constructing a word-based index from these lattices and using an approximate subword-based algorithm for accessing index entries from subword expansions of query terms. Results are presented for an experimental study demonstrating both STD performance and the potential for scaling the indexing strategy to very large collections of audio segments.
Keywords :
indexing; speech processing; speech recognition; word processing; approximate subword based algorithm; audio segment; automatic speech recognition system; lattice path indexing; lecture speech domain; open vocabulary spoken term detection; two stage open vocabulary spoken term detection; word based index; Speech recognition; spoken term detection;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
DOI :
10.1109/SLT.2010.5700850