DocumentCode
1063791
Title
A survey of methods and strategies in character segmentation
Author
Casey, Richard G. ; Lecolinet, Eric
Author_Institution
IBM Almaden Res. Center, San Jose, CA, USA
Volume
18
Issue
7
fYear
1996
fDate
7/1/1996 12:00:00 AM
Firstpage
690
Lastpage
706
Abstract
Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the “classical” approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called “dissection.” The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described
Keywords
hidden Markov models; image segmentation; optical character recognition; OCR process; character segmentation; connected character strings; dissection; holistic approaches; isolated characters; recognition rates; unconstrained printed; words; written text; Character recognition; Error analysis; Feature extraction; Hidden Markov models; Image analysis; Image recognition; Image segmentation; Optical character recognition software; Pattern recognition; Pipelines;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/34.506792
Filename
506792
Link To Document