Printed Urdu Nastalique Script Recognition Using Analytical Approach

Author

Sabahat Mir;Safdar Zaman;Muhammad Waqas Anwar

Author_Institution

Dept. of Comput. Sci., Institue of Inf. Technol., Abbottabad, Pakistan

fYear

2015

Firstpage

334

Lastpage

340

Abstract

Urdu as a language, is gaining popularity because lot many people around the world e.g, India, Pakistan, Bangladesh, etc., speak and understand it. Like other languages e.g, Latin, Chinese, Japanese, Persian, Arabic, etc., Urdu is also under consideration of research community for developing Optical Character Recognition (OCR) Systems. Like Arabic, Urdu script comes with a number of fonts e.g, Nasakh, Nastalique, Noori, etc. The presented work uses analytical approach to recognize machine written Urdu Nastalique script. The methodology includes 3 major modules, (1) Preprocessing that uses binarization and filtering on the input image, (2) Main Process that includes sub phases Line Segmentation, Baseline Detection, Thinning, Segmentation, Smoothing, Dot Recognition from preprocessed image, and (3) Recognition that normalizes the processed image into a standard size of 50×32 and makes a row vector of 1600 using row-major order. Finally it uses Feed Forward Neural Network to recognize the processed input image as one of the 271 ligature classes. The neural network has 1600 neurons in input layer, 60 hidden neurons, and 271 output neurons. The methodology is evaluated on 10 images, 69 lines, and 1292 ligatures. The overall recognition rate is 87%.

Keywords

"Optical character recognition software","Character recognition","Image segmentation","Feature extraction","Shape","Optical imaging","Image recognition"

Publisher

ieee

Conference_Titel

Frontiers of Information Technology (FIT), 2015 13th International Conference on

Type

conf

DOI

10.1109/FIT.2015.65

Filename

7421024