Automatic language identification of bilingual English and Farsi scripts

Author

Rezaee, Hamideh ; Geravanchizadeh, Masoud ; Razzazi, Farbod

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Tabriz, Tabriz, Iran

fYear

2009

fDate

14-16 Oct. 2009

Firstpage

1

Lastpage

4

Abstract

In general, printed documents may contain several different languages. Therefore, to use Optical Character Recognition (OCR) for multi-lingual documents, it is necessary to automatically separate these languages. In this paper, we describe a method for identification of printed Farsi and English text from images of documents in line and word levels. The proposed algorithm is developed based on statistical and shape-based features. The accuracy of this method is around 96.05%.

Keywords

document image processing; optical character recognition; English text idenification; Farsi scripts identification; automatic language identification; document image processing; line level document; optical character recognition; word level document; Character recognition; Distribution functions; Image converters; Image segmentation; Machine vision; Natural languages; Optical character recognition software; Optical filters; Shape; Text recognition; Document Image Processing; Language Identification; Multilingual Scripts; OCR;

fLanguage

English

Publisher

ieee

Conference_Titel

Application of Information and Communication Technologies, 2009. AICT 2009. International Conference on

Conference_Location

Baku

Print_ISBN

978-1-4244-4739-8

Electronic_ISBN

978-1-4244-4740-4

Type

conf

DOI

10.1109/ICAICT.2009.5372532

Filename

5372532