Subsequence similarity language models

Author

Huerta, Juan M.

Author_Institution

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

fYear

2011

fDate

22-27 May 2011

Firstpage

5580

Lastpage

5583

Abstract

In this work we present the Subsequence Similarity Language Model (S2-LM) which is a new approach to language modeling based on string similarity. As a language model, S2-LM generates scores based on the closest matching string given a very large corpus. In this paper we describe the properties and advantages of our approach and describe efficient methods to carry out its computation. We describe an n-best rescoring experiment intended to show that S2-LM can be adjusted to behave as an n-gram SLM model.

Keywords

formal languages; string matching; S2-LM; n-best rescoring experiment; n-gram SLM model; string matching; string similarity; subsequence similarity language models; language models; longest common subsequence;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location

Prague

ISSN

1520-6149

Print_ISBN

978-1-4577-0538-0

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2011.5947624

Filename

5947624

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2180680