• DocumentCode
    1989325
  • Title

    A POS-based fuzzy word clustering algorithm for continuous speech recognition systems

  • Author

    Momtazi, S. ; Sameti, H. ; Bahrani, M. ; Hafezi, N.

  • Author_Institution
    Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
  • fYear
    2007
  • fDate
    12-15 Feb. 2007
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Using word base n-gram language models in continuous speech recognition systems is so prevalent. For using this type of language models, we should extract them from large corpora. Since Persian corpora are not rich, therefore the extracted language models are not credible. For this reason, most researchers extract class n-grams instead of finding word n-grams. In this research a new idea for fuzzy word clustering is represented that each word can be assigned to more that one class. The Fuzzy c-mean algorithm is used for our clustering method and we have examined its various parameters of it. Finally, this algorithm was applied on 20000 most frequent Persian words extracted from ldquoPersian Text Corpusrdquo. The extracted language models are evaluated by perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also, the results of this language model were evaluated on speaker independent continuous speech recognition system and improved the system accuracy.
  • Keywords
    fuzzy set theory; languages; pattern clustering; speech recognition; Persian text corpus; Persian word; fuzzy c-mean algorithm; fuzzy word clustering algorithm; language models; speech recognition systems; Clustering algorithms; Clustering methods; Fuzzy systems; Helium; Natural languages; Probability; Speech processing; Speech recognition; Statistics; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-0778-1
  • Electronic_ISBN
    978-1-4244-1779-8
  • Type

    conf

  • DOI
    10.1109/ISSPA.2007.4555528
  • Filename
    4555528