• DocumentCode
    2149143
  • Title

    Translation-Inspired OCR

  • Author

    Genzel, Dmitriy ; Popat, Ashok C. ; Spasojevic, Nemanja ; Jahr, Michael ; Senior, Andrew ; Ie, Eugene ; Tang, Frank Yung-Fong

  • Author_Institution
    Google, Inc., Mountain View, CA, USA
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1339
  • Lastpage
    1343
  • Abstract
    Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and N-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic and real data in five languages.
  • Keywords
    computational linguistics; decoding; image coding; language translation; optical character recognition; N-gram language modeling; integrated decoding; minimum-error-rate training; multiple simple feature function; optical character recognition; statistical machine translation; translation-inspired OCR; Text analysis; Optical character recognition; minimum-error-rate training; statistical machine translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.269
  • Filename
    6065528