DocumentCode
2959576
Title
Automatic language identification of bilingual English and Farsi scripts
Author
Rezaee, Hamideh ; Geravanchizadeh, Masoud ; Razzazi, Farbod
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Tabriz, Tabriz, Iran
fYear
2009
fDate
14-16 Oct. 2009
Firstpage
1
Lastpage
4
Abstract
In general, printed documents may contain several different languages. Therefore, to use Optical Character Recognition (OCR) for multi-lingual documents, it is necessary to automatically separate these languages. In this paper, we describe a method for identification of printed Farsi and English text from images of documents in line and word levels. The proposed algorithm is developed based on statistical and shape-based features. The accuracy of this method is around 96.05%.
Keywords
document image processing; optical character recognition; English text idenification; Farsi scripts identification; automatic language identification; document image processing; line level document; optical character recognition; word level document; Character recognition; Distribution functions; Image converters; Image segmentation; Machine vision; Natural languages; Optical character recognition software; Optical filters; Shape; Text recognition; Document Image Processing; Language Identification; Multilingual Scripts; OCR;
fLanguage
English
Publisher
ieee
Conference_Titel
Application of Information and Communication Technologies, 2009. AICT 2009. International Conference on
Conference_Location
Baku
Print_ISBN
978-1-4244-4739-8
Electronic_ISBN
978-1-4244-4740-4
Type
conf
DOI
10.1109/ICAICT.2009.5372532
Filename
5372532
Link To Document