Title :
Out-of-vocabulary term detection by n-gram array with distance from continuous syllable recognition results
Author :
Iwami, Keisuke ; Fujii, Yasuhisa ; Yamamoto, Kazumasa ; Nakagawa, Seiichi
Author_Institution :
Dept. of Comput. Sci. & Eng., Toyohashi Univ. of Technol., Toyohashi, Japan
Abstract :
For spoken document retrieval, it is very important to consider Out-of-Vocabulary (OOV) and mis-recognition of spoken words. Therefore, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken document retrieval system that is robust for considering OOV words and mis-recognition of sub-units. To solve the problem of OOV keywords and mis-recognized words, we used individual syllables as sub-word unit in continuous speech recognition and an n-gram sequence of syllables as a retrieval unit. We propose an n-gram indexing/retrieval method with distance in a syllable lattice for attacking OOV, recognition errors, and high speed retrieval. We applied this method to academic lecture presentation database of 44 hours, and 60% of the OOV words were detected in less than 2.5 milliseconds.
Keywords :
document handling; indexing; information retrieval; speech recognition; Japanese spoken document retrieval system; continuous syllable recognition; n-gram array; n-gram indexing-retrieval method; out-of-vocabulary term detection; Out-of-Vocabulary; mis-recognition; n-gram; spoken term retrieval; syllable recognition;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
DOI :
10.1109/SLT.2010.5700853