• DocumentCode
    3777028
  • Title

    An improved algorithm for identifying mathematical formulas in the images of PDF documents

  • Author

    Chen Liu; Lina Zuo; Xinfu Li;Xuedong Tian

  • Author_Institution
    School of Computer Science and Technology, Hebei University, Baoding, China
  • fYear
    2015
  • Firstpage
    252
  • Lastpage
    256
  • Abstract
    Mathematical formula identification is an important part of mathematical formula recognition and retrieval. It is more difficult for extracting formulas from the document images in PDF files because of the diversity of their acquisition ways. To solve the problem, this paper designs a method of mathematical formula identification in English PDF document images, which includes three steps: judging columns, extracting mathematical formula character blocks, merging mathematical formula character blocks. Through analyzing and concluding characteristics of the document images in PDF files as well as its effects on mathematical formula identification, this paper designs a related parameter adjustment algorithm for avoiding influences on the performance of mathematical formula identification caused by the resolution variation. The experimental result shows that the adaptability of mathematical formula identification algorithm is improved by some applications.
  • Keywords
    "Image recognition","Optical character recognition software","Character recognition","Layout"
  • Publisher
    ieee
  • Conference_Titel
    Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
  • Print_ISBN
    978-1-4673-8086-7
  • Type

    conf

  • DOI
    10.1109/PIC.2015.7489848
  • Filename
    7489848