• DocumentCode
    2180680
  • Title

    Subsequence similarity language models

  • Author

    Huerta, Juan M.

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5580
  • Lastpage
    5583
  • Abstract
    In this work we present the Subsequence Similarity Language Model (S2-LM) which is a new approach to language modeling based on string similarity. As a language model, S2-LM generates scores based on the closest matching string given a very large corpus. In this paper we describe the properties and advantages of our approach and describe efficient methods to carry out its computation. We describe an n-best rescoring experiment intended to show that S2-LM can be adjusted to behave as an n-gram SLM model.
  • Keywords
    formal languages; string matching; S2-LM; n-best rescoring experiment; n-gram SLM model; string matching; string similarity; subsequence similarity language models; language models; longest common subsequence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947624
  • Filename
    5947624