Title :
Texture for script identification
Author :
Busch, Andrew ; Boles, Wageeh W. ; Sridharan, Sridha
Author_Institution :
Sch. of Microelectronic Eng., Griffith Univ., Nathan, Qld., Australia
Abstract :
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
Keywords :
document image processing; image texture; text analysis; visual databases; document image; script database; script identification; texture features; visual texture; Character recognition; Image analysis; Image databases; Indexing; Optical character recognition software; Sorting; Spatial databases; Text analysis; Training data; Visual databases; Index Terms- Script identification; classification and association rules.; clustering; document analysis; texture; wavelets and fractals; Algorithms; Artificial Intelligence; Automatic Data Processing; Documentation; Handwriting; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Models, Statistical; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reading; Reproducibility of Results; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Subtraction Technique;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2005.227