• DocumentCode
    3307752
  • Title

    A trainable, single-pass algorithm for column segmentation

  • Author

    Sylwester, Don ; Seth, Sharad

  • Author_Institution
    Dept. of Comput. Sci., Concordia Coll., Seward, NE, USA
  • Volume
    2
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    615
  • Abstract
    Column segmentation logically precedes OCR in the document analysis process. The trainable algorithm XYCUT relies on horizontal and vertical binary profiles to produce an XY-tree representing the column structure of a page of a technical document in a single pass through the bit image. Training against ground truth adjusts a single, resolution independent, parameter using only local information and guided by an edit distance function. The algorithm correctly segments the page image for a (fairly) wide range of parameter values, although small, local and repairable errors may be made, an effect measured by a repair cost function
  • Keywords
    document image processing; errors; image representation; image segmentation; learning (artificial intelligence); optical character recognition; technical presentation; OCR; XY-tree; XYCUT; column segmentation; document analysis; edit distance function; errors; ground truth; horizontal profiles; image representation; page image segmentation; page structure; repair cost function; resolution independent parameter; technical document; trainable single-pass algorithm; vertical binary profiles; Algorithm design and analysis; Computer science; Cost function; Educational institutions; Image segmentation; Optical character recognition software; Pixel; Robustness; Size measurement; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.601971
  • Filename
    601971