• DocumentCode
    2997641
  • Title

    Statistical language modeling using a small corpus from an application domain

  • Author

    Rohlicek, Jan R. ; Chow, Yen-Lu ; Roucos, Salim

  • Author_Institution
    BBN Lab. Inc., Cambridge, MA, USA
  • fYear
    1988
  • fDate
    11-14 Apr 1988
  • Firstpage
    267
  • Abstract
    Statistical language models have been successfully used to improve the performance of continuous speech recognition algorithms. Application of such techniques is difficult when only a small training corpus is available. The authors present an approach for dealing with limited training available from the DARPA resource management domain. An initial training corpus of sentences was abstracted by replacing sentence fragments or phrases with variables. This training corpus of phrase sequences was used to derive parameters of a Markov model. The probability of a word sequence is then decomposed into the probability of possible phrase sequences within each of the phrases. Initial results obtained on 150 utterances from six speakers in the DARPA database indicate that this language modeling technique has potential for improved recognition performance. Furthermore, this approach provides a framework for incorporating linguistic knowledge into statistical language models
  • Keywords
    Markov processes; linguistics; speech recognition; DARPA resource management domain; Markov model; continuous speech recognition algorithms; linguistic knowledge; phrase sequences; small training corpus; statistical language models; word sequence probability; Acoustic measurements; Character generation; Contracts; Database systems; Laboratories; Management training; Natural languages; Probability; Resource management; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
  • Conference_Location
    New York, NY
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.1988.196567
  • Filename
    196567