• DocumentCode
    714653
  • Title

    Turkish OCR on mobile and scanned document images

  • Author

    Karasu, Kurtulus ; Bastan, Muhammet

  • Author_Institution
    Bigisayar Muhendisligi Bolumu, Turgut Ozal Univ., Ankara, Turkey
  • fYear
    2015
  • fDate
    16-19 May 2015
  • Firstpage
    2074
  • Lastpage
    2077
  • Abstract
    Optical character recognition (OCR) systems have been widely used to convert documents into digital form. There are lots of both commercial and open source OCR systems available, but a benchmark on Turkish OCR is nonexistent. In this work, we first prepared two publicly available datasets for Turkish OCR, consisting of scanned document images and mobile camera captured document images. Then, we evaluated the Turkish OCR performance of three popular open source OCR systems (Tesseract, CuneiForm, GOCR) on the datasets. Tesseract outperformed the other two on both datasets.
  • Keywords
    document image processing; mobile computing; optical character recognition; public domain software; CuneiForm; GOCR; Tesseract; Turkish OCR systems; commercial OCR systems; mobile camera captured document images; open source OCR systems; optical character recognition systems; scanned document images; Benchmark testing; Cameras; Character recognition; Histograms; Layout; Mobile communication; Optical character recognition software; Tesseract; Turkish OCR; benchmark; dataset; mobile device; scanner;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communications Applications Conference (SIU), 2015 23th
  • Conference_Location
    Malatya
  • Type

    conf

  • DOI
    10.1109/SIU.2015.7130278
  • Filename
    7130278