DocumentCode :
167138
Title :
Shallow parsing on word lattices
Author :
Zaborowski, Bartosz
Author_Institution :
Inst. of Comput. Sci., Poland
fYear :
2014
fDate :
11-13 April 2014
Firstpage :
1
Lastpage :
4
Abstract :
The article presents preliminary results of a project which aims at developing a new algorithm for shallow parsing of a natural language. The main feature is that the algorithm allows efficient processing of word lattices. The main application of this feature will be a new rule-based approach to automatic speech recognition: with scoring sentence candidates for a given utterance with regard to their accordance with grammar of a particular natural language. The algorithm is being implemented on a basis of a shallow parsing and morphosyntactic disambiguation system Spejd. The Spejd formalism is designed to process a single (linear) sentence at once. The naive baseline approach to the task of scoring each of sentence candidates in a word lattice would be to process each of the candidates sequentially, after extracting from the lattice. Unfortunately, the time complexity of the naive approach is exponential in the size of the lattice. Hence, the basic approach is not applicable to real-life data. The rule pattern matching algorithm which is presented in the article uses Finite-State techniques to process the whole word lattice at once in polynomial time and space. It combines optimizations of processing of linear sentences used in the existing Spejd implementation with new ideas, specific to searching for regular patterns in word lattice. The improvements follow the idea of a Non-deterministic Finite-State Automaton simulation without backtracking presented by Ville Laurikari. This idea was extended to handle not only a bunch of NFA threads at once, but also to simultaneously process multiple paths passing through a particular word (a lattice node). The article also discusses an efficient application of RHS operations directly on a word lattice, which turns out not to be trivial due to the high expressive power of the Spejd formalism. The article is supplemented by a preliminary speed evaluation of the new implementation.
Keywords :
computational complexity; finite state machines; grammars; knowledge based systems; lattice theory; natural language processing; pattern matching; speech recognition; RHS operations; Spejd formalism; automatic speech recognition; finite-state techniques; grammar; lattice node; linear sentence processing optimization; morphosyntactic disambiguation system; natural language; nondeterministic finite-state automaton simulation; polynomial time complexity; rule pattern matching algorithm; rule-based approach; sentence candidate scoring; shallow parsing; word lattice processing; Automata; Complexity theory; Grammar; Lattices; Pattern matching; Speech recognition; Syntactics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pacific Voice Conference (PVC), 2014 XXII Annual
Conference_Location :
Krakow
Print_ISBN :
978-1-4799-3699-1
Type :
conf
DOI :
10.1109/PVC.2014.6845417
Filename :
6845417
Link To Document :
بازگشت