Title :
Linguistic stem concatenation for malay large vocabulary continuous speech recognition
Author :
Sze, Hong Kai ; Ping, Tan Tien ; Kong, Tang Enya ; Yu-N, Cheah
Author_Institution :
Fac. of Eng. & Sci., Uni. Tunku Abdul, Kuala Lumpur, Malaysia
Abstract :
This paper introduces a new stem concatenation method to improve Malay large vocabulary speech recognition system based on linguistic knowledge. Malay language is one of the agglutinative languages that form new words using various linguistic stems. This has caused the hypothesized texts to contain a number of floating stems. The proposed method concatenates these floating stems with the base words using Malay linguistic knowledge. The proposed method on Malay LVCSR for smaller training size of 11 thousands sentences can reduce the word error rate by 0.3%. The positive improvements are consistent for all tested different smoothing methods and training transcription sizes.
Keywords :
linguistics; speech recognition; vocabulary; LVCSR; Malay large vocabulary continuous speech recognition; linguistic stem concatenation method; LVCSR; agglutinative language; language modeling; linguistic stem; speech recognition;
Conference_Titel :
Research and Development (SCOReD), 2010 IEEE Student Conference on
Conference_Location :
Putrajaya
Print_ISBN :
978-1-4244-8647-2
DOI :
10.1109/SCORED.2010.5703990