DocumentCode
1582154
Title
Layout and language: exploring text block discovery in tables using linguistic resources
Author
Hurst, Matthew
Author_Institution
WhizBang!Labs, Pittsburgh, PA, USA
fYear
2001
fDate
6/23/1905 12:00:00 AM
Firstpage
523
Lastpage
527
Abstract
Identifying the textual content of table cells requires, in part, the successful resolution of ambiguities confusing multi-row cells and single-row cells, as well as the resolution of other layout based ambiguities. This paper investigates the application of linguistic resources to this problem and discusses algorithms that exploit both phrasal dictionaries and bigram language models for discovering the content of cells in flat text files
Keywords
dictionaries; document image processing; linguistics; bigram language models; document image processing; document representation; experiments; flat text files; linguistic resources; phrasal dictionaries; table cell textual content; table recognition; text block discovery; textual layout; Company reports; Computational linguistics; Dictionaries; Encoding; Investments; Security; Testing; Text recognition; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location
Seattle, WA
Print_ISBN
0-7695-1263-1
Type
conf
DOI
10.1109/ICDAR.2001.953844
Filename
953844
Link To Document