DocumentCode :
1582154
Title :
Layout and language: exploring text block discovery in tables using linguistic resources
Author :
Hurst, Matthew
Author_Institution :
WhizBang!Labs, Pittsburgh, PA, USA
fYear :
2001
fDate :
6/23/1905 12:00:00 AM
Firstpage :
523
Lastpage :
527
Abstract :
Identifying the textual content of table cells requires, in part, the successful resolution of ambiguities confusing multi-row cells and single-row cells, as well as the resolution of other layout based ambiguities. This paper investigates the application of linguistic resources to this problem and discusses algorithms that exploit both phrasal dictionaries and bigram language models for discovering the content of cells in flat text files
Keywords :
dictionaries; document image processing; linguistics; bigram language models; document image processing; document representation; experiments; flat text files; linguistic resources; phrasal dictionaries; table cell textual content; table recognition; text block discovery; textual layout; Company reports; Computational linguistics; Dictionaries; Encoding; Investments; Security; Testing; Text recognition; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
Type :
conf
DOI :
10.1109/ICDAR.2001.953844
Filename :
953844
Link To Document :
بازگشت