• DocumentCode
    3153298
  • Title

    A document segmentation, classification and recognition system

  • Author

    Shih, Frank Y. ; Chen, Shy-Shyan ; Hung, D. C Douglas ; Ng, Peter A.

  • Author_Institution
    Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
  • fYear
    1992
  • fDate
    15-18 Jun 1992
  • Firstpage
    258
  • Lastpage
    267
  • Abstract
    A discussion is given on a document segmentation, classification and recognition system for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation employs a two-step run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on clustering rules the block classification classifies each block into one of text, horizontal or vertical lines, graphics, and pictures. The text block is separated into isolated characters using projection profiles, and which are translated into ASCII codes through a font- and size-independent character recognition subsystem. Logo pictures discriminated from half-tone pictures are identified and converted into symbolic words. The experimental results show that the proposed system is capable of correctly reading different styles of mixed-mode printed documents
  • Keywords
    document handling; image recognition; office automation; ASCII codes; Logo pictures; block classification; block segmentation; classification; clustering rules; complex layout structures; daily-received office documents; document segmentation; half-tone pictures; mixed-mode contents; mixed-mode printed documents; multiple columns; projection profiles; recognition system; single-mode blocks; size-independent character recognition subsystem; symbolic words; two-step run-length smoothing algorithm; vertical lines; Character recognition; Computer graphics; Computer vision; Facsimile; Image coding; Image processing; Image segmentation; Information science; Smoothing methods; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Integration, 1992. ICSI '92., Proceedings of the Second International Conference on
  • Conference_Location
    Morristown, NJ
  • Print_ISBN
    0-8186-2697-6
  • Type

    conf

  • DOI
    10.1109/ICSI.1992.217295
  • Filename
    217295