DocumentCode :
180480
Title :
Efficient spoken term detection using confusion networks
Author :
Mangu, Lidia ; Kingsbury, Brian ; Soltau, Hagen ; Hong-Kwang Kuo ; Picheny, Michael
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
7844
Lastpage :
7848
Abstract :
In this paper, we present a fast, vocabulary independent algorithm for spoken term detection (STD) that demonstrates a word-based index is sufficient to achieve good performance for both in-vocabulary (IV) and out-of-vocabulary (OOV) terms. Previous approaches have required that a separate index be built at the sub-word level and then expanded to allow for matching OOV terms. Such a process, while accurate, is expensive in both time and memory. In the proposed architecture, a word-level confusion network (CN) based index is used for both IV and OOV search. This is implemented using a flexible WFST framework. Comparisons on 3 Babel languages (Tagalog, Pashto and Turkish) show that CN-based indexing results in better performance compared with the lattice approach while being orders of magnitude faster and having a much smaller footprint.
Keywords :
speech processing; vocabulary; 3 Babel language; CN-based indexing; IV term; OOV term; Pashto; STD; Tagalog; Turkish; flexible WFST framework; in-vocabulary term; out-of-vocabulary term; spoken term detection; vocabulary independent algorithm; word-based index; word-level confusion network; Acoustics; Indexing; Lattices; Speech; Transducers; Vocabulary; audio indexing; confusion networks; keyword search; keyword spotting; spoken term detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6855127
Filename :
6855127
Link To Document :
بازگشت