• DocumentCode
    167138
  • Title

    Shallow parsing on word lattices

  • Author

    Zaborowski, Bartosz

  • Author_Institution
    Inst. of Comput. Sci., Poland
  • fYear
    2014
  • fDate
    11-13 April 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The article presents preliminary results of a project which aims at developing a new algorithm for shallow parsing of a natural language. The main feature is that the algorithm allows efficient processing of word lattices. The main application of this feature will be a new rule-based approach to automatic speech recognition: with scoring sentence candidates for a given utterance with regard to their accordance with grammar of a particular natural language. The algorithm is being implemented on a basis of a shallow parsing and morphosyntactic disambiguation system Spejd. The Spejd formalism is designed to process a single (linear) sentence at once. The naive baseline approach to the task of scoring each of sentence candidates in a word lattice would be to process each of the candidates sequentially, after extracting from the lattice. Unfortunately, the time complexity of the naive approach is exponential in the size of the lattice. Hence, the basic approach is not applicable to real-life data. The rule pattern matching algorithm which is presented in the article uses Finite-State techniques to process the whole word lattice at once in polynomial time and space. It combines optimizations of processing of linear sentences used in the existing Spejd implementation with new ideas, specific to searching for regular patterns in word lattice. The improvements follow the idea of a Non-deterministic Finite-State Automaton simulation without backtracking presented by Ville Laurikari. This idea was extended to handle not only a bunch of NFA threads at once, but also to simultaneously process multiple paths passing through a particular word (a lattice node). The article also discusses an efficient application of RHS operations directly on a word lattice, which turns out not to be trivial due to the high expressive power of the Spejd formalism. The article is supplemented by a preliminary speed evaluation of the new implementation.
  • Keywords
    computational complexity; finite state machines; grammars; knowledge based systems; lattice theory; natural language processing; pattern matching; speech recognition; RHS operations; Spejd formalism; automatic speech recognition; finite-state techniques; grammar; lattice node; linear sentence processing optimization; morphosyntactic disambiguation system; natural language; nondeterministic finite-state automaton simulation; polynomial time complexity; rule pattern matching algorithm; rule-based approach; sentence candidate scoring; shallow parsing; word lattice processing; Automata; Complexity theory; Grammar; Lattices; Pattern matching; Speech recognition; Syntactics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pacific Voice Conference (PVC), 2014 XXII Annual
  • Conference_Location
    Krakow
  • Print_ISBN
    978-1-4799-3699-1
  • Type

    conf

  • DOI
    10.1109/PVC.2014.6845417
  • Filename
    6845417