Title :
A document image analysis system on parallel processors
Author :
Sural, Shamik ; Das, P.K.
Author_Institution :
CMC Ltd., Calcutta, India
Abstract :
The paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step minimum containing rectangle (MCR) detection stage. Text block filtering (TBF) is then done heuristically and the filtered blocks are submitted to a multilayer perceptron (MLP) for recognition of characters. Smoothing of the document image is done during MLP-based character recognition to reduce the preprocessing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the total execution time
Keywords :
character recognition; computational complexity; document image processing; feedforward neural nets; image segmentation; multilayer perceptrons; parallel processing; transputer systems; automatic document indexing; bold word identification; character recognition; communication overhead; computation time; data partitioning; document image analysis system; document image processing system; document image smoothing; heuristic text block filtering; image labelling; image segmentation; multilayer perceptron; parallel processors; parallelization; preprocessing stage; scanned document images; skew correction; total execution time; two-step minimum containing rectangle detection stage; Character recognition; Document image processing; Filtering; Image analysis; Image segmentation; Machine assisted indexing; Multilayer perceptrons; Smoothing methods; Text analysis; Text recognition;
Conference_Titel :
High-Performance Computing, 1997. Proceedings. Fourth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-8186-8067-9
DOI :
10.1109/HIPC.1997.634542