• DocumentCode
    518228
  • Title

    A modified recursive x-y cut algorithm for solving block ordering problems

  • Author

    Sutheebanjard, Phaisarn ; Premchaiswadi, Wichian

  • Author_Institution
    Grad. Sch. of Inf. Technol., Siam Univ., Bangkok, Thailand
  • Volume
    3
  • fYear
    2010
  • fDate
    16-18 April 2010
  • Abstract
    To achieve the best results from an OCR system, the pre-processing steps must be performed with a high degree of accuracy and reliability. There are two critically important steps in the OCR pre-processing phase. First, blocks must be extracted from each page of the scanned document. Secondly, all blocks resulting from the first step must be arranged in the correct order. One of the most notable techniques for block ordering in the second step is the recursive x-y cut (RXYC) algorithm. This technique works accurately only when applied to documents with a simple page layout but it causes incorrect block ordering when applied to documents with complex page layouts. This paper proposes a modified recursive x-y cut algorithm for solving block ordering problems for documents with complex page layouts. This proposed algorithm can solve problems such as (1) the overlapping block problem; (2) the blocks overlay problem, and (3) the L-Shaped block problem.
  • Keywords
    document image processing; optical character recognition; L-Shaped block problem; OCR preprocessing phase; OCR system; block ordering problem; blocks overlay problem; complex page layouts; modified recursive x-y cut algorithm; overlapping block problem; scanned document; Character recognition; Costs; Data mining; Electronic publishing; Information technology; Optical character recognition software; Problem-solving; block ordering; recursive x-y cut;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-6347-3
  • Type

    conf

  • DOI
    10.1109/ICCET.2010.5485882
  • Filename
    5485882