• DocumentCode
    1864379
  • Title

    Automated segmentation and classification of chemical and other equations from document images

  • Author

    Jana, Prerana ; Majumdar, Anubhab ; Layek, Ashish Kumar ; Mandal, Sekhar ; Das, Amit Kumar

  • Author_Institution
    Comput. Sci. & Technol. Dept., IIEST Shibpur, Shibpur, India
  • fYear
    2015
  • fDate
    4-7 Jan. 2015
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Segmentation of mathematical equations from document images is already a major research area for improved performance of OCR systems. Though chemical equations are also sharing similar spatial properties as that of non-chemical equations (for example, mathematical equations), efforts to segment those are still to be explored. This paper presents a novel method for segmenting and identifying chemical and any other equations in heterogeneous document images that may contain graphics, tables, text and the classifying them into two categories; chemical and non-chemical equations. This study, a first of its kind, as far our knowledge goes, not only improves the OCR performance, but also leads to creation of chemical database and formation of bond electron matrix from chemical equations or formulae. In our proposed method we extracted the equations using morphological operators and histogram analysis and the extracted equations are classified using an open source OCR engine. The effectiveness of the proposed method is demonstrated by testing it on 152 document images. Test results show an accuracy of 97.4% and 97.45% for segmentation and classification, respectively.
  • Keywords
    chemical engineering computing; document image processing; image classification; image segmentation; matrix algebra; optical character recognition; OCR systems; automated chemical segmentation; bond electron matrix; chemical classification; chemical database; chemical equations; heterogeneous document images; histogram analysis; mathematical equation segmentation; morphological operators; nonchemical equations; open source OCR engine; spatial properties; Accuracy; Chemicals; Equations; Histograms; Image segmentation; Mathematical model; Optical character recognition software; Mathematical symbols; histogram analysis; morphological operation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on
  • Conference_Location
    Kolkata
  • Type

    conf

  • DOI
    10.1109/ICAPR.2015.7050678
  • Filename
    7050678