DocumentCode :
3140375
Title :
Character extraction from noisy background for an automatic reference system
Author :
Negishi, Hideyuki ; Kato, Jien ; Hase, Hiroyuki ; Watanabe, Toyohide
Author_Institution :
Dept. of Intellectual Inf. Syst. Eng., Toyama Univ., Japan
fYear :
1999
fDate :
20-22 Sep 1999
Firstpage :
143
Lastpage :
146
Abstract :
It is important to provide digitized manuscripts of old literature (in page image form) and their electronic text (in full-text form), with an automatic reference mechanism between the images and the text, on the Internet. As an essential step for creating such an automatic reference system, this paper describes the issue of extracting character areas from page images of old handwritten manuscripts. Page images of old manuscripts are usually terribly dirty and considerable large in size. To overcome the first problem, we propose a new effective method for separating characters from noisy background, since conventional threshold selection techniques are inadequate to cope with the image where the gray levels of the character parts are overlapped by that of the background. To solve the second problem, we propose an approach based on a downscaled image and a recursive labeling method for word extraction. This approach is suitable for large size images because it has the advantage of saving memory and reducing processing time
Keywords :
Internet; document image processing; feature extraction; handwritten character recognition; Internet; automatic reference system; character extraction; character recognition; digitized manuscripts; handwritten documents; literature; noisy background images; recursive labeling; threshold selection; word extraction; Background noise; Boolean functions; Data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
Type :
conf
DOI :
10.1109/ICDAR.1999.791745
Filename :
791745
Link To Document :
بازگشت