Title :
Layout and language: exploring text block discovery in tables using linguistic resources
Author_Institution :
WhizBang!Labs, Pittsburgh, PA, USA
fDate :
6/23/1905 12:00:00 AM
Abstract :
Identifying the textual content of table cells requires, in part, the successful resolution of ambiguities confusing multi-row cells and single-row cells, as well as the resolution of other layout based ambiguities. This paper investigates the application of linguistic resources to this problem and discusses algorithms that exploit both phrasal dictionaries and bigram language models for discovering the content of cells in flat text files
Keywords :
dictionaries; document image processing; linguistics; bigram language models; document image processing; document representation; experiments; flat text files; linguistic resources; phrasal dictionaries; table cell textual content; table recognition; text block discovery; textual layout; Company reports; Computational linguistics; Dictionaries; Encoding; Investments; Security; Testing; Text recognition; USA Councils;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953844