• DocumentCode
    2836133
  • Title

    A Possibilistic Approach for Building Statistical Language Models

  • Author

    Momtazi, Saeedeh ; Sameti, Hossein

  • Author_Institution
    Univ. of Saarland, Saarbrucken, Germany
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 2 2009
  • Firstpage
    1014
  • Lastpage
    1018
  • Abstract
    Class-based n-gram language models are those most frequently-used in continuous speech recognition systems, especially for languages for which no richly annotated corpora are available. Various word clustering algorithms have been proposed to build such class-based models. In this work, we discuss the superiority of soft approaches to class construction, whereby each word can be assigned to more than one class. We also propose a new method for possibilistic word clustering. The possibilistic C-mean algorithm is used as our clustering method. Various parameters of this algorithm are investigated; e.g., centroid initialization, distance measure, and words´ feature vector. In the experiments reported here, this algorithm is applied to the 20,000 most frequent Persian words, and the language model built with the clusters created in this fashion is evaluated based on its perplexity and the accuracy of a continuous speech recognition system. Our results indicate a 10% reduction in perplexity and a 4% reduction in word error rate.
  • Keywords
    computational linguistics; natural language processing; pattern clustering; speech recognition; Persian words; class construction; class-based n-gram language models; continuous speech recognition systems; possibilistic C-mean algorithm; possibilistic word clustering; soft approaches; statistical language models; Buildings; Clustering algorithms; Clustering methods; Error analysis; History; Intelligent structures; Intelligent systems; Natural languages; Speech recognition; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-1-4244-4735-0
  • Electronic_ISBN
    978-0-7695-3872-3
  • Type

    conf

  • DOI
    10.1109/ISDA.2009.197
  • Filename
    5364438