DocumentCode :
2304799
Title :
Skew detection, page segmentation, and script classification of printed document images
Author :
Waked, B. ; Bergler, S. ; Suen, C.Y. ; Khoury, S.
Author_Institution :
Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
Volume :
5
fYear :
1998
fDate :
11-14 Oct 1998
Firstpage :
4470
Abstract :
Automatic processing of international documents presents a number of challenging problems because Optical Character Recognition (OCR) techniques are not available for all languages and all script classes. Document images must be categorized according to their script type first, in our case Roman, Ideographic, or Arabic. We present a set of statistical methods that first detect and correct the skew of a document image. Next, the page is segmented into text and graphical components. The textual components are then segmented into paragraphs and lines; and finally we classify the script type into one of three categories. The system predicts the correct script category in 91% of cases when tested on real-life documents of varying kinds, diverse formats and qualities from many sources
Keywords :
character recognition; document image processing; image segmentation; natural languages; international documents; page segmentation; printed document images; script classification; script type; skew detection; textual components; Character recognition; Feature extraction; Gabor filters; Image segmentation; Machine intelligence; Natural languages; Optical character recognition software; Optical filters; Pattern recognition; Statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on
Conference_Location :
San Diego, CA
ISSN :
1062-922X
Print_ISBN :
0-7803-4778-1
Type :
conf
DOI :
10.1109/ICSMC.1998.727554
Filename :
727554
Link To Document :
بازگشت