DocumentCode :
2381535
Title :
Linguistic stem concatenation for malay large vocabulary continuous speech recognition
Author :
Sze, Hong Kai ; Ping, Tan Tien ; Kong, Tang Enya ; Yu-N, Cheah
Author_Institution :
Fac. of Eng. & Sci., Uni. Tunku Abdul, Kuala Lumpur, Malaysia
fYear :
2010
fDate :
13-14 Dec. 2010
Firstpage :
144
Lastpage :
148
Abstract :
This paper introduces a new stem concatenation method to improve Malay large vocabulary speech recognition system based on linguistic knowledge. Malay language is one of the agglutinative languages that form new words using various linguistic stems. This has caused the hypothesized texts to contain a number of floating stems. The proposed method concatenates these floating stems with the base words using Malay linguistic knowledge. The proposed method on Malay LVCSR for smaller training size of 11 thousands sentences can reduce the word error rate by 0.3%. The positive improvements are consistent for all tested different smoothing methods and training transcription sizes.
Keywords :
linguistics; speech recognition; vocabulary; LVCSR; Malay large vocabulary continuous speech recognition; linguistic stem concatenation method; LVCSR; agglutinative language; language modeling; linguistic stem; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research and Development (SCOReD), 2010 IEEE Student Conference on
Conference_Location :
Putrajaya
Print_ISBN :
978-1-4244-8647-2
Type :
conf
DOI :
10.1109/SCORED.2010.5703990
Filename :
5703990
Link To Document :
بازگشت