Title :
A POS-based fuzzy word clustering algorithm for continuous speech recognition systems
Author :
Momtazi, S. ; Sameti, H. ; Bahrani, M. ; Hafezi, N.
Author_Institution :
Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
Abstract :
Using word base n-gram language models in continuous speech recognition systems is so prevalent. For using this type of language models, we should extract them from large corpora. Since Persian corpora are not rich, therefore the extracted language models are not credible. For this reason, most researchers extract class n-grams instead of finding word n-grams. In this research a new idea for fuzzy word clustering is represented that each word can be assigned to more that one class. The Fuzzy c-mean algorithm is used for our clustering method and we have examined its various parameters of it. Finally, this algorithm was applied on 20000 most frequent Persian words extracted from ldquoPersian Text Corpusrdquo. The extracted language models are evaluated by perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also, the results of this language model were evaluated on speaker independent continuous speech recognition system and improved the system accuracy.
Keywords :
fuzzy set theory; languages; pattern clustering; speech recognition; Persian text corpus; Persian word; fuzzy c-mean algorithm; fuzzy word clustering algorithm; language models; speech recognition systems; Clustering algorithms; Clustering methods; Fuzzy systems; Helium; Natural languages; Probability; Speech processing; Speech recognition; Statistics; Training data;
Conference_Titel :
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-0778-1
Electronic_ISBN :
978-1-4244-1779-8
DOI :
10.1109/ISSPA.2007.4555528