DocumentCode :
3755603
Title :
Printed Urdu Nastalique Script Recognition Using Analytical Approach
Author :
Sabahat Mir;Safdar Zaman;Muhammad Waqas Anwar
Author_Institution :
Dept. of Comput. Sci., Institue of Inf. Technol., Abbottabad, Pakistan
fYear :
2015
Firstpage :
334
Lastpage :
340
Abstract :
Urdu as a language, is gaining popularity because lot many people around the world e.g, India, Pakistan, Bangladesh, etc., speak and understand it. Like other languages e.g, Latin, Chinese, Japanese, Persian, Arabic, etc., Urdu is also under consideration of research community for developing Optical Character Recognition (OCR) Systems. Like Arabic, Urdu script comes with a number of fonts e.g, Nasakh, Nastalique, Noori, etc. The presented work uses analytical approach to recognize machine written Urdu Nastalique script. The methodology includes 3 major modules, (1) Preprocessing that uses binarization and filtering on the input image, (2) Main Process that includes sub phases Line Segmentation, Baseline Detection, Thinning, Segmentation, Smoothing, Dot Recognition from preprocessed image, and (3) Recognition that normalizes the processed image into a standard size of 50×32 and makes a row vector of 1600 using row-major order. Finally it uses Feed Forward Neural Network to recognize the processed input image as one of the 271 ligature classes. The neural network has 1600 neurons in input layer, 60 hidden neurons, and 271 output neurons. The methodology is evaluated on 10 images, 69 lines, and 1292 ligatures. The overall recognition rate is 87%.
Keywords :
"Optical character recognition software","Character recognition","Image segmentation","Feature extraction","Shape","Optical imaging","Image recognition"
Publisher :
ieee
Conference_Titel :
Frontiers of Information Technology (FIT), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/FIT.2015.65
Filename :
7421024
Link To Document :
بازگشت