• DocumentCode
    2143605
  • Title

    Composite Script Identification and Orientation Detection for Indian Text Images

  • Author

    Ghosh, Shamita ; Chaudhuri, Bidyut B.

  • Author_Institution
    Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Kolkata, India
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    294
  • Lastpage
    298
  • Abstract
    A major preprocessing step in a multi-script OCR is to identify the script type of the test document image. The published papers on script identification usually assume that the test image is in correct i.e. 0° orientation. But by mistake a document may be fed to the system in wrong orientation, say at an angle of nearly 180° or ±90°. In this method we propose a script identification method that works for unknown orientation for all 11 official Indian scripts. Here, we first find the skew and counter-rotate the document by the skew angle. This will lead to correct (0°) or upside down (180°) orientation. Then script identification is done by a multi-stage tree classifier using features invariant to 0°/180° orientation. Next we go to find the orientation of the image by a two class classifier for each script. Performance of the proposed method has been tested on a variety of documents and promising results have been obtained.
  • Keywords
    document image processing; image classification; optical character recognition; trees (mathematics); Indian text images; composite script identification; multiscript OCR; multistage tree classifier; official Indian scripts; orientation detection; preprocessing step; test document image; Accuracy; Feature extraction; Kernel; Pattern recognition; Reservoirs; Support vector machines; Indian document processing; Orientation detection; Reservoir principle; Script type identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.67
  • Filename
    6065322