• DocumentCode
    860160
  • Title

    Recognizing mathematical expressions using tree transformation

  • Author

    Zanibbi, Richard ; Blostein, Dorothea ; Cordy, James R.

  • Author_Institution
    Sch. of Comput., Queen´´s Univ., Kingston, Ont., Canada
  • Volume
    24
  • Issue
    11
  • fYear
    2002
  • fDate
    11/1/2002 12:00:00 AM
  • Firstpage
    1455
  • Lastpage
    1467
  • Abstract
    We describe a robust and efficient system for recognizing typeset and handwritten mathematical notation. From a list of symbols with bounding boxes the system analyzes an expression in three successive passes. The Layout Pass constructs a Baseline Structure Tree (BST) describing the two-dimensional arrangement of input symbols. Reading order and operator dominance are used to allow efficient recognition of symbol layout even when symbols deviate greatly from their ideal positions. Next, the Lexical Pass produces a Lexed BST from the initial BST by grouping tokens comprised of multiple input symbols; these include decimal numbers, function names, and symbols comprised of nonoverlapping primitives such as "=". The Lexical Pass also labels vertical structures such as fractions and accents. The Lexed BST is translated into LATEX. Additional processing, necessary for producing output for symbolic algebra systems, is carried out in the Expression Analysis Pass. The Lexed BST is translated into an Operator Tree, which describes the order and scope of operations in the input expression. The tree manipulations used in each pass are represented compactly using tree transformations. The compiler-like architecture of the system allows robust handling of unexpected input, increases the scalability of the system, and provides the groundwork for handling dialects of mathematical notation.
  • Keywords
    document image processing; handwritten character recognition; optical character recognition; symbol manipulation; tree data structures; Baseline Structure Tree; Expression Analysis Pass; LATEX; Layout Pass; Lexical Pass; Operator Tree; compiler-like architecture; diagram recognition; document image analysis; handwritten mathematical notation recognition; mathematical expression recognition; scalability; symbolic algebra systems; symbols; tree data structure; tree transformation; typeset mathematical notation recognition; Algebra; Binary search trees; Computer Society; Handwriting recognition; Image recognition; Mathematics; Pattern recognition; Robustness; Tree graphs; Typesetting;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2002.1046157
  • Filename
    1046157