DocumentCode :
3489426
Title :
An OCR System with OCRopus for Scientific Documents Containing Mathematical Formulas
Author :
Furukori, F. ; Yamazaki, Shumpei ; Miyagishi, T. ; Shirai, Keigo ; Okamoto, Mitsuo
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
1175
Lastpage :
1179
Abstract :
This paper describes the installation of a mathematical formula recognition module into an open source OCR system: OCRopus. In particular we consider the identification of inline formulas utilizing existing modules. Text lines including math formulas are first processed using a N-gram language model to reduce the number of formula candidates by thresholding the conditional probability of words. Then the formula candidates are classified into formulas and texts by SVM using geometric features associated with the bounding boxes of symbols.
Keywords :
document image processing; geometry; optical character recognition; probability; support vector machines; OCRopus; SVM; conditional probability; geometric features; mathematical formula recognition module; n-gram language model; open source OCR system; scientific documents; text lines; Accuracy; Image recognition; Layout; Mathematical model; Optical character recognition software; Support vector machines; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.238
Filename :
6628799
Link To Document :
بازگشت