Title :
Image-based word recognition in oriental language document images
Author :
Zhu, Jason ; Hull, Jonathan J.
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
Abstract :
An algorithm for word recognition in oriental languages such as Chinese, Japanese, and Korean is presented. The objective is to recognize words, that are composed of a number of consecutive characters, in document images where there are no explicit visually defined word boundaries. The technique exploits the redundancy in these languages that is expressed by the difference between the number of possible character strings of a fixed length and the number of legal words of that length. Sequences of character images are matched simultaneously to lists of legal words and illegal strings that are likely to occur. A word is located if its image is more likely to occur in the current context than any of the illegal strings that are visually similar to it. No intermediate character recognition step is used. The application of contextual information directly to the interpretation of features extracted from the image overcomes noise that could have made isolated character recognition impossible and the location of words with conventional postprocessing algorithms difficult. Experimental results are presented that show the ability of this algorithm to correctly recognize text in the presence of noise
Keywords :
optical character recognition; Chinese; Japanese; Korean; character image sequences; feature extraction; image-based word recognition; noise; oriental language document images; redundancy; Character recognition; Data mining; Degradation; Facsimile; Feature extraction; Image recognition; Law; Legal factors; Natural languages; Text recognition;
Conference_Titel :
Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on
Conference_Location :
Jerusalem
Print_ISBN :
0-8186-6270-0
DOI :
10.1109/ICPR.1994.576924