DocumentCode :
2446167
Title :
Preparing Persian/Arabic Scanned Images for OCR
Author :
Shirali-Shahreza, Sajad ; Manzuri-Shalmani, M.T. ; Shirali-Shahreza, M. Hassan
Author_Institution :
Dept. of Comput. Eng., Sharif Univ. of Tech., Tehran
Volume :
1
fYear :
0
fDate :
0-0 0
Firstpage :
1332
Lastpage :
1336
Abstract :
Digital documents are widely used today. So converting written documents such as books to digital documents is unavoidable. The most popular method for doing this is OCR. Usually documents are scanned and then scanned images are sent to OCR. Scanned images need some preprocessing in order to be used in OCR efficiently. In this paper, we introduce a method for preparing scanned Persian/Arabic printed texts for OCR. Our method considered especial features of Persian/Arabic scripts such as dots and connecting characters. Main phases of our work are converting grayscale image to binary image, removing straight lines and frames and identifying picture components
Keywords :
document image processing; image segmentation; optical character recognition; pattern recognition; Persian/Arabic scanned images; binary image; digital documents; grayscale image; image processing; optical character recognition; page segmentation; pattern recognition; picture components identification; written document conversion; Books; Character recognition; Electrostatic precipitators; Gray-scale; Image converters; Image processing; Image segmentation; Joining processes; Natural languages; Optical character recognition software; Arabic/Persian Document; Image Processing; OCR; Page Segmentation; Pattern Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
Type :
conf
DOI :
10.1109/ICTTA.2006.1684574
Filename :
1684574
Link To Document :
بازگشت