• DocumentCode
    2061766
  • Title

    A document image analysis system on parallel processors

  • Author

    Sural, Shamik ; Das, P.K.

  • Author_Institution
    CMC Ltd., Calcutta, India
  • fYear
    1997
  • fDate
    18-21 Dec 1997
  • Firstpage
    527
  • Lastpage
    532
  • Abstract
    The paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step minimum containing rectangle (MCR) detection stage. Text block filtering (TBF) is then done heuristically and the filtered blocks are submitted to a multilayer perceptron (MLP) for recognition of characters. Smoothing of the document image is done during MLP-based character recognition to reduce the preprocessing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the total execution time
  • Keywords
    character recognition; computational complexity; document image processing; feedforward neural nets; image segmentation; multilayer perceptrons; parallel processing; transputer systems; automatic document indexing; bold word identification; character recognition; communication overhead; computation time; data partitioning; document image analysis system; document image processing system; document image smoothing; heuristic text block filtering; image labelling; image segmentation; multilayer perceptron; parallel processors; parallelization; preprocessing stage; scanned document images; skew correction; total execution time; two-step minimum containing rectangle detection stage; Character recognition; Document image processing; Filtering; Image analysis; Image segmentation; Machine assisted indexing; Multilayer perceptrons; Smoothing methods; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Performance Computing, 1997. Proceedings. Fourth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-8186-8067-9
  • Type

    conf

  • DOI
    10.1109/HIPC.1997.634542
  • Filename
    634542