• DocumentCode
    3145576
  • Title

    Improving Nastalique specific pre-recognition process for Urdu OCR

  • Author

    Javed, Sobia Tariq ; Hussain, Sarmad

  • Author_Institution
    Center for Res. in Urdu Language Process., Nat. Univ. of Comput. & Emerging Sci., Pakistan
  • fYear
    2009
  • fDate
    14-15 Dec. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Urdu language is written using Arabic script in Nastalique writing style. Nastalique script is highly cursive, context sensitive and is hard to process as only the last character in its ligature sits on the baseline. In addition, it exhibits character and ligature level spatial overlap. Due to these factors, the placement of dots and other diacritics is also highly contextual and variable. There is now increasing amount of work to process and recognize Nastalique script to develop Urdu OCR. This paper proposes improvements to these methods. The paper focuses on Nastalique specific pre-processing methods which can be employed before the text recognition process. The recognition and post recognition processes will be addressed separately.
  • Keywords
    natural language processing; optical character recognition; Arabic script; Nastalique script; Nastalique specific pre-processing method; Nastalique specific pre-recognition process; Nastalique writing style; Urdu OCR; Urdu language; optical character recognition; text recognition process; Character recognition; Data mining; Image recognition; Image segmentation; Information retrieval; Optical character recognition software; Optical distortion; Optical sensors; Text recognition; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multitopic Conference, 2009. INMIC 2009. IEEE 13th International
  • Conference_Location
    Islamabad
  • Print_ISBN
    978-1-4244-4872-2
  • Electronic_ISBN
    978-1-4244-4873-9
  • Type

    conf

  • DOI
    10.1109/INMIC.2009.5383111
  • Filename
    5383111