DocumentCode
3777028
Title
An improved algorithm for identifying mathematical formulas in the images of PDF documents
Author
Chen Liu; Lina Zuo; Xinfu Li;Xuedong Tian
Author_Institution
School of Computer Science and Technology, Hebei University, Baoding, China
fYear
2015
Firstpage
252
Lastpage
256
Abstract
Mathematical formula identification is an important part of mathematical formula recognition and retrieval. It is more difficult for extracting formulas from the document images in PDF files because of the diversity of their acquisition ways. To solve the problem, this paper designs a method of mathematical formula identification in English PDF document images, which includes three steps: judging columns, extracting mathematical formula character blocks, merging mathematical formula character blocks. Through analyzing and concluding characteristics of the document images in PDF files as well as its effects on mathematical formula identification, this paper designs a related parameter adjustment algorithm for avoiding influences on the performance of mathematical formula identification caused by the resolution variation. The experimental result shows that the adaptability of mathematical formula identification algorithm is improved by some applications.
Keywords
"Image recognition","Optical character recognition software","Character recognition","Layout"
Publisher
ieee
Conference_Titel
Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
Print_ISBN
978-1-4673-8086-7
Type
conf
DOI
10.1109/PIC.2015.7489848
Filename
7489848
Link To Document