Title :
Middle Zone Component Extraction and Recognition of Telugu Document Image
Author :
Pratap, R.L. ; Satyaprasad, L. ; Sastry, A.
Author_Institution :
JNTU Coll. of Eng., Hyderabad
Abstract :
Telugu is one of the ancient languages of South India. It has a complex orthography with a large number of distinct character shapes composed of simple and compound characters. The work reported in literature till the recent period is based on the connected component approach. Less attention is observed on the generalized character model and its application in the OCR development. Script syllable follows canonical structure where a consonant vowel core is preceded by one or two optional consonants .Formation of a syllable posses unique structural nature. In the present work, structural features of the syllable and the component model are combined to extract middle zone components. The shape of the middle zone components is closely related to a circle whereas other components are found with different topological features. Recognition rate of 99 percent is observed with the proposed method.
Keywords :
document image processing; feature extraction; image recognition; OCR; South India; Telugu document image recognition; middle zone component extraction; orthography; script syllable; Character recognition; Data mining; Educational institutions; Feature extraction; Head; Image recognition; Image segmentation; Optical character recognition software; Shape; Writing;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4376982