DocumentCode :
1990518
Title :
Language identification in historical Afghan manuscripts
Author :
Farooq, Faisal ; Govindaraju, Venu
Author_Institution :
CEDAR, State Univ. of New York, Amherst, NY
fYear :
2007
fDate :
12-15 Feb. 2007
Firstpage :
1
Lastpage :
4
Abstract :
Automatic language identification is an important step prior to optical character recognition (OCR). In this paper we present a system to discriminate between Arabic and Persian in historical Afghan manuscripts. The classification is performed at a sub-sentence level. We propose a feature extraction algorithm for a sub-sentence based on Gabor filters followed by classification using a support vector machine (SVM). An overall precision of 96.72% and 94.90% is obtained for Persian and Arabic respectively.
Keywords :
Gabor filters; feature extraction; history; image classification; natural language processing; optical character recognition; support vector machines; Gabor filters; Persian; automatic language identification; feature extraction algorithm; historical Afghan manuscripts; optical character recognition; sub-sentence classification; support vector machine; Character recognition; Feature extraction; Gabor filters; Optical character recognition software; Optical filters; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-0778-1
Electronic_ISBN :
978-1-4244-1779-8
Type :
conf
DOI :
10.1109/ISSPA.2007.4555588
Filename :
4555588
Link To Document :
بازگشت