• DocumentCode
    1798623
  • Title

    Text line extraction for historical document images using steerable directional filters

  • Author

    Alaql, Omar ; Cheng Chang Lu

  • Author_Institution
    Dept. of Comput. Sci., Kent State Univ. Kent, Kent, OH, USA
  • fYear
    2014
  • fDate
    7-9 July 2014
  • Firstpage
    312
  • Lastpage
    317
  • Abstract
    Vast amounts of valuable historical documents exist in libraries and in various National Archives that have not been exploited electronically. The analysis of historical documents presents specific difficulties with respect to other types of handwritten documents. Because of the low quality and the complexity of these documents, the document analysis remains an open research field. One of the major processes to analyze these documents is automatic text line extraction, which influences the accuracy of text recognition. The Center for Unified Biometrics and Sensors (CUBS) proposed one of the best-known approaches for text line extraction. In this paper, and starting with the concepts of CUBS approach, we propose an approach to extract text lines from the historical document images. The proposed approach is based on three local connectivity maps. One has the orientation angles of the text lines, and it is generated by using a dynamic steerable directional filter. This map is modified by using a mode filter to determine the paragraph map in the documents. Based on the values of the paragraph map, the adaptive local connectivity map (ALCM) is generated by using a static steerable directional filter to estimate the location of the text line. The proposed approach solves the problem of the ALCM binarization that the CUBS approach has, and gives the advantage of extracting the paragraphs in the document besides the text lines segmentation.
  • Keywords
    biometrics (access control); document image processing; feature extraction; history; libraries; records management; text analysis; ALCM; CUBS; Center for Unified Biometrics and Sensors; adaptive local connectivity map; document analysis; handwritten documents; historical document images; libraries; national archives; paragraph map; static steerable directional filter; text line extraction; text recognition; Educational institutions; Filtering algorithms; Image segmentation; Kernel; Level set; Libraries; Optical filters; adaptive local connectivity map (ALCM); local connectivity directions map (LCDM); paragraph map;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Audio, Language and Image Processing (ICALIP), 2014 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4799-3902-2
  • Type

    conf

  • DOI
    10.1109/ICALIP.2014.7009807
  • Filename
    7009807