Title :
A generic method for determining the up/down orientation of text in Roman and non-Roman scripts
Author :
Aradhye, Hrishikesh B.
Author_Institution :
SRI Int., Menlo Park, CA, USA
fDate :
29 Aug.-1 Sept. 2005
Abstract :
This paper presents a method for determining the up/down orientation of text in a scanned document of unknown orientation. The method analyzes the "open" portions of text blobs to determine the direction in which the open portions face. By determining the respective densities of blobs opening in a pair of opposite directions (e.g., right or left), the method can establish the direction in which the text as a whole is oriented. We first discuss the orientation of Roman text based on the asymmetry in the openness of Roman letters in the horizontal direction. For non-Roman text such as Pashto and Hebrew, we determine a direction that is the most asymmetric, and therefore the most useful for orientation, given a training dataset. This direction is then used for orientation. This work can be used for automated orientation of mail, checks in ATM envelopes, and scanned, copied, or faxed documents.
Keywords :
document image processing; natural languages; optical character recognition; text analysis; Roman scripts; document image processing; natural languages; non-Roman scripts; optical character recognition; text analysis; text blobs; training dataset; up-down text orientation; Automation; Character recognition; Facsimile; Frequency; Ink; Optical character recognition software; Postal services; Real time systems; Text recognition; Watermarking;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.13