DocumentCode
2149143
Title
Translation-Inspired OCR
Author
Genzel, Dmitriy ; Popat, Ashok C. ; Spasojevic, Nemanja ; Jahr, Michael ; Senior, Andrew ; Ie, Eugene ; Tang, Frank Yung-Fong
Author_Institution
Google, Inc., Mountain View, CA, USA
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
1339
Lastpage
1343
Abstract
Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and N-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic and real data in five languages.
Keywords
computational linguistics; decoding; image coding; language translation; optical character recognition; N-gram language modeling; integrated decoding; minimum-error-rate training; multiple simple feature function; optical character recognition; statistical machine translation; translation-inspired OCR; Text analysis; Optical character recognition; minimum-error-rate training; statistical machine translation;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.269
Filename
6065528
Link To Document