• DocumentCode
    1063791
  • Title

    A survey of methods and strategies in character segmentation

  • Author

    Casey, Richard G. ; Lecolinet, Eric

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • Volume
    18
  • Issue
    7
  • fYear
    1996
  • fDate
    7/1/1996 12:00:00 AM
  • Firstpage
    690
  • Lastpage
    706
  • Abstract
    Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the “classical” approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called “dissection.” The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described
  • Keywords
    hidden Markov models; image segmentation; optical character recognition; OCR process; character segmentation; connected character strings; dissection; holistic approaches; isolated characters; recognition rates; unconstrained printed; words; written text; Character recognition; Error analysis; Feature extraction; Hidden Markov models; Image analysis; Image recognition; Image segmentation; Optical character recognition software; Pattern recognition; Pipelines;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.506792
  • Filename
    506792