• DocumentCode
    1274377
  • Title

    FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-Gram Language Model

  • Author

    Yuret, Deniz

  • Author_Institution
    Dept. of Comput. Eng., Koc Univ., İstanbul, Turkey
  • Volume
    19
  • Issue
    11
  • fYear
    2012
  • Firstpage
    725
  • Lastpage
    728
  • Abstract
    Lexical substitutes have found use in areas such as paraphrasing, text simplification, machine translation, word sense disambiguation, and part of speech induction. However the computational complexity of accurately identifying the most likely substitutes for a word has made large scale experiments difficult. In this letter we introduce a new search algorithm, FASTSUBS, that is guaranteed to find the K most likely lexical substitutes for a given word in a sentence based on an n-gram language model. The computation is sublinear in both K and the vocabulary size V. An implementation of the algorithm and a dataset with the top 100 substitutes of each token in the WSJ section of the Penn Treebank are available at http://goo.gl/jzKH0.
  • Keywords
    computational complexity; natural language processing; search problems; trees (mathematics); vocabulary; FASTSUBS; N-gram language model; Penn treebank; WSJ section; computational complexity; large scale experiments; lexical substitutes; machine translation; n-gram language model; paraphrasing; part of speech induction; search algorithm; text simplification; vocabulary size; word sense disambiguation; Context; Equations; Mathematical model; Probability; Signal processing algorithms; Upper bound; Vocabulary; Lexical substitutes; statistical language modeling;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2012.2215587
  • Filename
    6287552