• DocumentCode
    2145035
  • Title

    Language-Independent Text Lines Extraction Using Seam Carving

  • Author

    Saabni, Raid ; El-Sana, Jihad

  • Author_Institution
    Triangle R&D Center, Kafr Qara, Israel
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    563
  • Lastpage
    568
  • Abstract
    In this paper, we present a novel language-independent algorithm for extracting text-lines from handwritten document images. Our algorithm is based on the seam carving approach for content aware image resizing. We adopted the signed distance transform to generate the energy map, where extreme points indicate the layout of text-lines. Dynamic programming is then used to compute the minimum energy left-to-right paths (seams), which pass along the ``middle`` of the text-lines. Each path intersects a set of components, which determine the extracted text-line and estimate its hight. The estimated hight determines the text-line´s region, which guides splitting touching components among consecutive lines. Unassigned components that fall within the region of a text-line are added to the components list of the line. The components between two consecutive lines are processed when the two lines are extracted and assigned to the closest text-line, based on the attributes of extracted lines, the sizes and positions of components. Our experimental results on Arabic, Chinese, and English historical documents show that our approach manage to separate multi-skew text blocks into lines at high success rates.
  • Keywords
    document image processing; dynamic programming; feature extraction; handwriting recognition; natural language processing; transforms; English historical documents; distance transform; dynamic programming; handwritten document images; image resizing; language independent text lines extraction; seam carving; Algorithm design and analysis; Arrays; Dynamic programming; Equations; Keyword search; Layout; Transforms; Dynamic programming; Handwriting; Line Extraction; Multilingual; Seam Carving; Signed Distance Transform;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.119
  • Filename
    6065374