DocumentCode
1798623
Title
Text line extraction for historical document images using steerable directional filters
Author
Alaql, Omar ; Cheng Chang Lu
Author_Institution
Dept. of Comput. Sci., Kent State Univ. Kent, Kent, OH, USA
fYear
2014
fDate
7-9 July 2014
Firstpage
312
Lastpage
317
Abstract
Vast amounts of valuable historical documents exist in libraries and in various National Archives that have not been exploited electronically. The analysis of historical documents presents specific difficulties with respect to other types of handwritten documents. Because of the low quality and the complexity of these documents, the document analysis remains an open research field. One of the major processes to analyze these documents is automatic text line extraction, which influences the accuracy of text recognition. The Center for Unified Biometrics and Sensors (CUBS) proposed one of the best-known approaches for text line extraction. In this paper, and starting with the concepts of CUBS approach, we propose an approach to extract text lines from the historical document images. The proposed approach is based on three local connectivity maps. One has the orientation angles of the text lines, and it is generated by using a dynamic steerable directional filter. This map is modified by using a mode filter to determine the paragraph map in the documents. Based on the values of the paragraph map, the adaptive local connectivity map (ALCM) is generated by using a static steerable directional filter to estimate the location of the text line. The proposed approach solves the problem of the ALCM binarization that the CUBS approach has, and gives the advantage of extracting the paragraphs in the document besides the text lines segmentation.
Keywords
biometrics (access control); document image processing; feature extraction; history; libraries; records management; text analysis; ALCM; CUBS; Center for Unified Biometrics and Sensors; adaptive local connectivity map; document analysis; handwritten documents; historical document images; libraries; national archives; paragraph map; static steerable directional filter; text line extraction; text recognition; Educational institutions; Filtering algorithms; Image segmentation; Kernel; Level set; Libraries; Optical filters; adaptive local connectivity map (ALCM); local connectivity directions map (LCDM); paragraph map;
fLanguage
English
Publisher
ieee
Conference_Titel
Audio, Language and Image Processing (ICALIP), 2014 International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4799-3902-2
Type
conf
DOI
10.1109/ICALIP.2014.7009807
Filename
7009807
Link To Document