Title :
Applying the T-Recs table recognition system to the business letter domain
Author :
Kieninger, Thomas ; Dengel, Andreas
Author_Institution :
Inf. Manage. Dept., DFKI GmbH, Kaiserslautern, Germany
fDate :
6/23/1905 12:00:00 AM
Abstract :
This paper summarizes the core idea of the T-Recs table recognition system, an integrated system covering block-segmentation, table location and a model-free structural analysis of tables. T-Recs works on the output of commercial OCR systems that provide the word bounding box geometry together with the text itself (e.g. Xerox ScanWorX). While T-Recs performs well on a number of document categories, business letters still remained a challenging domain because the T-Recs location heuristics are mislead by their header or footer resulting in a low recognition precision. Business letters such as invoices are a very interesting domain for industrial applications due to the large amount of documents to be analyzed and the importance of the data carried within their tables. Hence, we developed a more restrictive approach which is implemented in the T-Recs++ prototype. This paper describes the ideas of the T-Recs++ location and also proposes a quality evaluation measure that reflects the bottom-up strategy of either T-Recs or T-Recs++. Finally, some results comparing both systems on a collection of business letters are given
Keywords :
business data processing; document image processing; image segmentation; optical character recognition; OCR; ScanWorX; T-Recs; block segmentation; business letters; document image processing; image segmentation; model free structural analysis; quality evaluation; table location; table recognition system; word bounding box geometry; Artificial intelligence; Business; Character recognition; Data mining; Information management; Interleaved codes; Optical character recognition software; Prototypes; Text analysis; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953843