• DocumentCode
    3755603
  • Title

    Printed Urdu Nastalique Script Recognition Using Analytical Approach

  • Author

    Sabahat Mir;Safdar Zaman;Muhammad Waqas Anwar

  • Author_Institution
    Dept. of Comput. Sci., Institue of Inf. Technol., Abbottabad, Pakistan
  • fYear
    2015
  • Firstpage
    334
  • Lastpage
    340
  • Abstract
    Urdu as a language, is gaining popularity because lot many people around the world e.g, India, Pakistan, Bangladesh, etc., speak and understand it. Like other languages e.g, Latin, Chinese, Japanese, Persian, Arabic, etc., Urdu is also under consideration of research community for developing Optical Character Recognition (OCR) Systems. Like Arabic, Urdu script comes with a number of fonts e.g, Nasakh, Nastalique, Noori, etc. The presented work uses analytical approach to recognize machine written Urdu Nastalique script. The methodology includes 3 major modules, (1) Preprocessing that uses binarization and filtering on the input image, (2) Main Process that includes sub phases Line Segmentation, Baseline Detection, Thinning, Segmentation, Smoothing, Dot Recognition from preprocessed image, and (3) Recognition that normalizes the processed image into a standard size of 50×32 and makes a row vector of 1600 using row-major order. Finally it uses Feed Forward Neural Network to recognize the processed input image as one of the 271 ligature classes. The neural network has 1600 neurons in input layer, 60 hidden neurons, and 271 output neurons. The methodology is evaluated on 10 images, 69 lines, and 1292 ligatures. The overall recognition rate is 87%.
  • Keywords
    "Optical character recognition software","Character recognition","Image segmentation","Feature extraction","Shape","Optical imaging","Image recognition"
  • Publisher
    ieee
  • Conference_Titel
    Frontiers of Information Technology (FIT), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/FIT.2015.65
  • Filename
    7421024