• DocumentCode
    1583338
  • Title

    Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line

  • Author

    Pal, U. ; Chaudhuri, B.B.

  • Author_Institution
    Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    790
  • Lastpage
    794
  • Abstract
    In a general situation, a document page may contain several scriptforms. For optical character recognition (OCR) of such a document page, it is necessary to separate the scripts before feeding them to their individual OCR systems. An automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. Shape based features, statistical features and some features obtained from the concept of a water reservoir are used for script identification. The proposed scheme has an accuracy of about 97.33%
  • Keywords
    document image processing; feature extraction; natural languages; optical character recognition; Arabic; Bangla script; Chinese; Devnagari; English; OCR systems; automatic script line identification; automatic technique; document page; optical character recognition; printed Roman text; printed text line identification; script forms; shape based features; statistical features; water reservoir; Computer vision; Fractals; Optical character recognition software; Optical devices; Pattern recognition; Probability; Reservoirs; Shape; Water resources; Water storage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953896
  • Filename
    953896