Title :
Farsi and Latin script identification using curvature scale space features
Author :
Khoddami, Malike ; Behrad, Alireza
Author_Institution :
Fac. of Eng., Shahed Univ., Tehran, Iran
Abstract :
Script recognition is a necessary process before OCR algorithm in multilingual systems. In this paper, a novel method is proposed for identifying Farsi and Latin scripts in bilingual document using curvature scale space features. The proposed features are rotation and scale invariant and can be used to identify scripts with different fonts. We assumed that the bilingual scripts may have Farsi and English words and characters together; therefore the algorithm is designed to be able to recognize scripts in the connected components level. The output of the recognition is then generalized to word, line and page levels. Experimental results show that the proposed method has good accuracy especially in word and connected component levels.
Keywords :
document image processing; feature extraction; optical character recognition; text analysis; Farsi script identification; Latin script identification; bilingual document; curvature scale space features; optical character recognition; script recognition; Accuracy; Algorithm design and analysis; Character recognition; Classification algorithms; Feature extraction; Pixel; Shape; Curvature scale space; Optical character recognition; Script identification;
Conference_Titel :
Neural Network Applications in Electrical Engineering (NEUREL), 2010 10th Symposium on
Conference_Location :
Belgrade
Print_ISBN :
978-1-4244-8821-6
DOI :
10.1109/NEUREL.2010.5644061