• DocumentCode
    2959576
  • Title

    Automatic language identification of bilingual English and Farsi scripts

  • Author

    Rezaee, Hamideh ; Geravanchizadeh, Masoud ; Razzazi, Farbod

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Tabriz, Tabriz, Iran
  • fYear
    2009
  • fDate
    14-16 Oct. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In general, printed documents may contain several different languages. Therefore, to use Optical Character Recognition (OCR) for multi-lingual documents, it is necessary to automatically separate these languages. In this paper, we describe a method for identification of printed Farsi and English text from images of documents in line and word levels. The proposed algorithm is developed based on statistical and shape-based features. The accuracy of this method is around 96.05%.
  • Keywords
    document image processing; optical character recognition; English text idenification; Farsi scripts identification; automatic language identification; document image processing; line level document; optical character recognition; word level document; Character recognition; Distribution functions; Image converters; Image segmentation; Machine vision; Natural languages; Optical character recognition software; Optical filters; Shape; Text recognition; Document Image Processing; Language Identification; Multilingual Scripts; OCR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Application of Information and Communication Technologies, 2009. AICT 2009. International Conference on
  • Conference_Location
    Baku
  • Print_ISBN
    978-1-4244-4739-8
  • Electronic_ISBN
    978-1-4244-4740-4
  • Type

    conf

  • DOI
    10.1109/ICAICT.2009.5372532
  • Filename
    5372532