Title :
An OCR System with OCRopus for Scientific Documents Containing Mathematical Formulas
Author :
Furukori, F. ; Yamazaki, Shumpei ; Miyagishi, T. ; Shirai, Keigo ; Okamoto, Mitsuo
Abstract :
This paper describes the installation of a mathematical formula recognition module into an open source OCR system: OCRopus. In particular we consider the identification of inline formulas utilizing existing modules. Text lines including math formulas are first processed using a N-gram language model to reduce the number of formula candidates by thresholding the conditional probability of words. Then the formula candidates are classified into formulas and texts by SVM using geometric features associated with the bounding boxes of symbols.
Keywords :
document image processing; geometry; optical character recognition; probability; support vector machines; OCRopus; SVM; conditional probability; geometric features; mathematical formula recognition module; n-gram language model; open source OCR system; scientific documents; text lines; Accuracy; Image recognition; Layout; Mathematical model; Optical character recognition software; Support vector machines; Text recognition;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.238