• DocumentCode
    1582122
  • Title

    Applying the T-Recs table recognition system to the business letter domain

  • Author

    Kieninger, Thomas ; Dengel, Andreas

  • Author_Institution
    Inf. Manage. Dept., DFKI GmbH, Kaiserslautern, Germany
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    518
  • Lastpage
    522
  • Abstract
    This paper summarizes the core idea of the T-Recs table recognition system, an integrated system covering block-segmentation, table location and a model-free structural analysis of tables. T-Recs works on the output of commercial OCR systems that provide the word bounding box geometry together with the text itself (e.g. Xerox ScanWorX). While T-Recs performs well on a number of document categories, business letters still remained a challenging domain because the T-Recs location heuristics are mislead by their header or footer resulting in a low recognition precision. Business letters such as invoices are a very interesting domain for industrial applications due to the large amount of documents to be analyzed and the importance of the data carried within their tables. Hence, we developed a more restrictive approach which is implemented in the T-Recs++ prototype. This paper describes the ideas of the T-Recs++ location and also proposes a quality evaluation measure that reflects the bottom-up strategy of either T-Recs or T-Recs++. Finally, some results comparing both systems on a collection of business letters are given
  • Keywords
    business data processing; document image processing; image segmentation; optical character recognition; OCR; ScanWorX; T-Recs; block segmentation; business letters; document image processing; image segmentation; model free structural analysis; quality evaluation; table location; table recognition system; word bounding box geometry; Artificial intelligence; Business; Character recognition; Data mining; Information management; Interleaved codes; Optical character recognition software; Prototypes; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953843
  • Filename
    953843