Title :
Research on Optimization Segmentation Algorithm for Chinese/English Mixed Character Image in OCR
Author :
Liu Mingzhu ; Suo Yuxiu ; Ding Yinan
Author_Institution :
Higher Educ. Key Lab. for Meas. &Control Technol. & Instrumentations of Heilongjiang Province, Harbin Univ. of Sci. & Technol., Harbin, China
Abstract :
In allusion to the problem of low accuracy rate recognition in Chinese/English mixed characters, the paper researches on optimization algorithm for segmentation in Chinese/English mixed characters based on OCR system. Rough segment for text images is based on vertical projection method, which follows to characters segmentation theory, extraction of Chinese character, Chinese character component and English number connectivity regional. In Chinese character component connectivity regional, traditional Chinese character component merging algorithms will cause some Chinese characters components are merged incompletely, therefore, an unit merging algorithm based on feedback recognition is presented to merge Chinese character component, in Chinese character, English and number connectivity regional, as adhesion character can lead to segmentation errors, achieving the detection of adhesion character and re-segmentation through the geometric features of character. The test of mixed character segmentation showes that: In the course of recognition on Chinese/English mixed character, the segmentation optimization algorithm have obvious advantage over the traditional algorithms on the accuracy rate of recognition, especially on Chinese characters those are composed of left and right components.
Keywords :
feedback; image segmentation; natural language processing; optical character recognition; text analysis; Chinese character component connectivity regional component merging algorithms; Chinese-English mixed character image; English number connectivity regional; OCR system; adhesion character; character segmentation theory; feedback recognition; optimization segmentation algorithm; rough segment; text image segmentation; traditional Chinese character component merging algorithms; unit merging algorithm; vertical projection method; Accuracy; Adhesives; Character recognition; Image segmentation; Merging; Optical character recognition software; Optimization; Chinese/English mixed character; OCR; feedback recognition; unit merging;
Conference_Titel :
Instrumentation and Measurement, Computer, Communication and Control (IMCCC), 2014 Fourth International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4799-6574-8
DOI :
10.1109/IMCCC.2014.162