Title :
Optical character recognition system for Urdu
Author :
Sardar, Shuwair ; Wahab, Abdul
Abstract :
The subject of Urdu OCR is of real importance. There are about 60 to 80 million speakers of Urdu language ranked as fifth most spoken language with 4.7 percent of the total world population, spoken in South Asia vastly in Pakistan and India. Huge amount of valuable Urdu literature from philosophy to sciences is in vanishing and useless form because it has not been digitized till now. More importantly many of the native speakers of Urdu, especially in Pakistan can only read and write Urdu language and very rare data is available for them on internet and in digitized form. Because of its complexity very rare and partially research work and implementation has been done and therefore no complete OCR for Urdu language exits till now. More importantly mostly research has been done for Urdu OCR is with respect to scripts, fonts and text environment which are other obstacles in the way of making complete OCR. So we have research and moderately implements online and offline OCR system which is irrespective of Urdu scripts and fonts.
Keywords :
natural language processing; optical character recognition; text analysis; Urdu language; Urdu scripts; optical character recognition system; Character recognition; Compounds; Feature extraction; Image recognition; Optical character recognition software; Pixel; Writing; Component Extraction; Line Extraction; OCR; Pre-processing; Primary Stroke; Secondary Stroke; Segmentation;
Conference_Titel :
Information and Emerging Technologies (ICIET), 2010 International Conference on
Conference_Location :
Karachi
Print_ISBN :
978-1-4244-8001-2
DOI :
10.1109/ICIET.2010.5625694