Title :
Trainable script identification strategies for Indian languages
Author :
Chaudhury, Santanu ; Sheth, Rabindra
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Delhi, India
Abstract :
Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient
Keywords :
document image processing; feature extraction; frequency-domain analysis; image classification; optical character recognition; Gabor filter; Indian languages; classification schemes; document image processing; feature extraction; frequency distribution; frequency domain representation; multilingual documents; script identification strategies; Decision trees; Gabor filters; Head; Optical character recognition software; Read only memory; Shape; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791873