• DocumentCode
    1585049
  • Title

    Techniques for language identification for hybrid Arabic-English document images

  • Author

    Elgammal, Ahmed M. ; Ismail, Mohamed A.

  • Author_Institution
    Dept. of Comput. Sci., Maryland Univ., College Park, MD, USA
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    1100
  • Lastpage
    1104
  • Abstract
    Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrids of these languages requires that the language of the text is to be identified prior to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script are presented and evaluated. These techniques address the language identification problem on the word level and on text level. The characteristics of horizontal projection profiles as well as runlength histograms for text written in both languages are the basic features underlying these techniques. Solving this problem is very important in building bilingual document image analysis systems which are capable of processing documents containing hybrid Arabic/Romance and Anglo Saxon languages
  • Keywords
    character recognition; document image processing; natural languages; neural nets; text analysis; Anglo Saxon; Arabic; Romance; bilingual document image analysis; document image analysis; language identification; multi-lingual document image analysis; multi-lingual environment; Character recognition; Computer science; Educational institutions; Image analysis; Image recognition; Natural languages; Optical character recognition software; TV; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953956
  • Filename
    953956