• DocumentCode
    336820
  • Title

    Combination of words and word categories in varigram histories

  • Author

    Blasig, Reinhard

  • Author_Institution
    Philips Res. Lab., Aachen, Germany
  • Volume
    1
  • fYear
    1999
  • fDate
    15-19 Mar 1999
  • Firstpage
    529
  • Abstract
    This paper presents a new kind of language model: category/word varigrams. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJO corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent
  • Keywords
    computational linguistics; natural languages; 1994 ARPA evaluation data; WER; WSJO corpus; category-based modeling; category/word varigrams; language model; perplexity reduction; varigram histories; word categories; word error rate; word history; word sequences; word-based modeling; words; Educational technology; Error analysis; History; Interpolation; Laboratories; Natural languages; Predictive models; Probability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-5041-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1999.758179
  • Filename
    758179