Title :
Efficient geometric algorithms for parsing in two dimensions
Author :
Liang, Percy ; Narasimhan, Mukund ; Shilman, Michael ; Viola, Paul
Author_Institution :
Microsoft Res., One Microsoft Way, Redmond, WA, USA
fDate :
29 Aug.-1 Sept. 2005
Abstract :
Grammars are a powerful technique for modeling and extracting the structure of documents. One large challenge, however, is computational complexity. The computational cost of grammatical parsing is related to both the complexity of the input and the ambiguity of the grammar. For programming languages, where the terminals appear in a linear sequence and the grammar is unambiguous, parsing is O(N). For natural languages, which are linear yet have an ambiguous grammar, parsing is O(N3). For documents, where the terminals are arranged in two dimensions and the grammar is ambiguous, parsing time can be exponential in the number of terminals. In this paper we introduce (and unify) several types of geometrical data structures which can be used to significantly accelerate parsing time. Each data structure embodies a different geometrical constraint on the set of possible valid parses. These data structures are very general, in that they can be used by any type of grammatical model, and a wide variety of document understanding tasks, to limit the set of hypotheses examined and tested. Assuming a clean design for the parsing software, the same parsing framework can be tested with various geometric constraints to determine the most effective combination.
Keywords :
computational complexity; document handling; grammars; natural languages; ambiguous grammar; computational complexity; document structures; geometric algorithms; geometrical data structures; grammars; grammatical parsing; natural languages; programming languages; Acceleration; Computational complexity; Computational efficiency; Computer languages; Data structures; History; Natural languages; Paper technology; Software testing; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.98