Title :
Machine printed character segmentation method using side profiles
Author :
Jung, Min-Chul ; Shin, Yong-Chul ; Srihari, Sargur N.
Author_Institution :
Center of Excellence for Document Anal. & Recognition, State Univ. of New York, Buffalo, NY, USA
Abstract :
A segmentation method for a machine printed character string with arbitrary length is proposed. It exploits recognition-based segmentation, combined with heuristic and holistic methods. The merged part of touching characters generates different shape of patterns from the primitive character patterns. However, far left side and far right side patterns in the touching characters are not affected by the touching. The algorithm firstly constructs a line adjacency graph (LAG) from a word image. Blobs are found as connected components of the LAG and small dot noises are removed. Secondly, as a word in English can be divided into three typographical zones such as the ascender, the x height and the descender, the location of the connected components among those zones are also examined. Thirdly, the right profile of the touching character is compared with that of the sample characters in the prototype and then the touching characters are segmented with the width of one of the candidates in the prototype. Finally, upward, downward and left profiles of the segmented pattern are compared with those of the candidate respectively. Third and final steps are continued until confirmed by successful matchings of the resulting character patterns. It has been tested with touching characters in Times and in Helvetica fonts that are proportional pitch fonts and found that the proposed method is promising
Keywords :
graph theory; image segmentation; optical character recognition; Helvetica font; Times font; blobs; downward profile; heuristic methods; holistic methods; left profile; line adjacency graph; machine printed character segmentation method; primitive character patterns; proportional pitch fonts; recognition-based segmentation; right profile; side profiles; touching characters; typographical zones; upward profile; word image; Character generation; Character recognition; Image segmentation; Optical character recognition software; Pattern matching; Pattern recognition; Prototypes; Shape; Testing; Text analysis;
Conference_Titel :
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7803-5731-0
DOI :
10.1109/ICSMC.1999.816665