DocumentCode :
3222310
Title :
A multi-font OCR system for printed Telugu text
Author :
Lakshmi, C. Vasantha ; Patvardhan, C.
Author_Institution :
Dept. of Phys., Dayalbagh Educ. Inst., Agra, India
fYear :
2002
fDate :
13-15 Dec. 2002
Firstpage :
7
Lastpage :
17
Abstract :
This work describes the design and development of a Telugu Optical Character Recognition system for printed text (TOSP). Pre-processing tasks considered in this paper are: Conversion of a grey scale image to a binary image, image rectification, skew detection and removal, segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by the recognizer. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can directly be opened in any Indian language software that supports transliteration facility into Telugu script and edited. Several such softwares are popular and available.
Keywords :
image segmentation; optical character recognition; Indian language software; TOSP; Telugu Optical Character Recognition system; binary image; grey scale image; image rectification; pre-processing tasks; printed text; skew detection; skew removal; text segmentation; transliteration facility; Optical character recognition software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Language Engineering Conference, 2002. Proceedings
Print_ISBN :
0-7695-1885-0
Type :
conf
DOI :
10.1109/LEC.2002.1182284
Filename :
1182284
Link To Document :
بازگشت