• DocumentCode
    3486529
  • Title

    A Text Line Detection Method for Mathematical Formula Recognition

  • Author

    Xiaoyan Lin ; Liangcai Gao ; Zhi Tang ; Baker, James ; Alkalai, Mohamed ; Sorge, Volker

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    339
  • Lastpage
    343
  • Abstract
    Text line detection is a prerequisite procedure of mathematical formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, mathematical formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at mathematical formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from mathematical documents. Furthermore, the error rate in mathematical formula identification is reduced significantly through adopting the proposed text line detection method.
  • Keywords
    image segmentation; learning (artificial intelligence); text analysis; text detection; PPC; error rate; incorrectly split text line detection method; layout features; learning-based merging strategy; mathematical documents; mathematical formula identification; mathematical formula recognition; text features; text line segmentation; two-dimensional mathematics structures; Accuracy; Feature extraction; Layout; Merging; Testing; Text recognition; Training; Text line detection; mathematical formula identification; mathematical formula recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.75
  • Filename
    6628640