DocumentCode :
3585075
Title :
Subword scheme for keyword search
Author :
Zhipeng Chen ; Teng Zhang ; Ji Wu
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2014
Firstpage :
483
Lastpage :
488
Abstract :
Keyword search (KWS) is an important application of spoken language technology. The technique of Large Vocabulary Continuous Speech Recognition (LVCSR) is playing an important role in KWS system. However, for a language with large vocabulary and relatively insufficient text corpus, the vocabulary size keeps going up very quickly with the increasing amount of text, as we observed in Tamil. This brings difficulty in training a reliable language model, which may undermine KWS performance. Subword unit has been successfully employed in KWS system to handle out-of-vocabulary (OOV) problem. Inspired by this, we propose a novel subword scheme from the perspective of pronunciation to alleviate the large vocabulary problem. We find that the subword-based system outperforms our best word-based system on Tamil conversational telephone speech. The experiment of system combination shows that, over the best word-based system, a single subword-based system contains more complementary information than the total of that of the other three word-based systems.
Keywords :
natural language processing; speech recognition; text analysis; KWS system; LVCSR technique; OOV problem; Tamil conversational telephone speech; Tamil language; keyword search; large-vocabulary continuous speech recognition technique; large-vocabulary language; out-of-vocabulary problem; pronunciation perspective; spoken language technology; subword unit; subword-based system; text corpus; vocabulary size; Hidden Markov models; Keyword search; Speech; Speech recognition; Training; Training data; Vocabulary; Keyword search; large vocabulary; speech recognition; subword;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
Type :
conf
DOI :
10.1109/SLT.2014.7078622
Filename :
7078622
Link To Document :
بازگشت