Title :
Correcting broken characters in the recognition of historical printed documents
Author :
Droettboom, Michael
Author_Institution :
Digital Knowledge Center, Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
Keywords :
character sets; document image processing; graph theory; history; optical character recognition; OCR; broken character correction; connected component; graph combinatorics; historical printed document; optical character recognition; Business; Carbon capture and storage; Character recognition; Combinatorial mathematics; Degradation; Optical character recognition software; Optical design; Printing; Robustness; Shape;
Conference_Titel :
Digital Libraries, 2003. Proceedings. 2003 Joint Conference on
Print_ISBN :
0-7695-1939-3
DOI :
10.1109/JCDL.2003.1204889