Title :
An improved algorithm for identifying mathematical formulas in the images of PDF documents
Author :
Chen Liu; Lina Zuo; Xinfu Li;Xuedong Tian
Author_Institution :
School of Computer Science and Technology, Hebei University, Baoding, China
Abstract :
Mathematical formula identification is an important part of mathematical formula recognition and retrieval. It is more difficult for extracting formulas from the document images in PDF files because of the diversity of their acquisition ways. To solve the problem, this paper designs a method of mathematical formula identification in English PDF document images, which includes three steps: judging columns, extracting mathematical formula character blocks, merging mathematical formula character blocks. Through analyzing and concluding characteristics of the document images in PDF files as well as its effects on mathematical formula identification, this paper designs a related parameter adjustment algorithm for avoiding influences on the performance of mathematical formula identification caused by the resolution variation. The experimental result shows that the adaptability of mathematical formula identification algorithm is improved by some applications.
Keywords :
"Image recognition","Optical character recognition software","Character recognition","Layout"
Conference_Titel :
Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
Print_ISBN :
978-1-4673-8086-7
DOI :
10.1109/PIC.2015.7489848